Still presenting regression results in tables? why not forest plots?

Data Science

In this post I reproduce an elegant JAMA forest plot using R and Python.

Mihiretu Kebede(PhD)
2025-09-27

Why Are We Still Using Endless Tables?

Every week I open papers that bury their main results in wall-to-wall regression tables: dozens of rows, tiny fonts, and confidence intervals that may even require a ruler and patience.

Let’s see this elegant forest plot presented in this article: Associations Between Diabetes and Incident Colorectal Cancer. It looks nice and easy to read, right? So what is stopping many from presenting regression results like this—clear, visual, and instantly interpretable?

A single figure tells the story: which factors matter, the direction of effect, and the uncertainty—no squinting at page-long tables.

Getting the Data

This figure contains only a handful of rows, so I simply typed the numbers by hand from the article and supplement.
That guarantees 100 % fidelity.

For larger figures you could automate extraction.
The tesseract package in R can OCR a PDF or PNG:

library(tesseract)
txt <- ocr("jama.jpg")
cat(txt)
HR Favors no | Favors Pfor
Characteristic (95% Cl) colorectal cancer colorectal cancer interaction
Income, $ 3B
>50000 1.51 (0.84-2.72) —-—.
<15000 1,72 (1.32-2.24) —
‘Smoking status 04
Never 1.10 (0.81-1.48) —
Former 2.07 (1.41-3.06) —.—
Current 1.62 (1.14-2.31) —
Obesity (BMI 230) 83
No 1.46 (1.11-1.92) —
Yes 1.48 (1.12 -1.95) —
Sex 33
Male 1,27 (0.93-1.75) —
Female 1,59 (1.24-2.04) aed
Race and ethnicity 33
Non-Hispanic Black 1.43 (1.13-1.81) Se
Non-Hispanic White 1.69 (1.15-2.47) ——
rr nn)
HR (95% Cl)

…but for eleven lines of data, manual entry wins on speed and accuracy.

df <- data.frame(
  variable = c("Income, $", "Income, $", "Smoking", "Smoking", "Smoking",
               "Obesity", "Obesity", "Sex", "Sex",
               "Race and ethnicity", "Race and ethnicity"),
  characteristic = c(">50000", "<15000", "Never", "Former", "Current",
                     "No", "Yes", "Male", "Female",
                     "Non-Hispanic Black", "Non-Hispanic White"),
  HR    = c(1.51, 1.72, 1.10, 2.07, 1.62, 1.46, 1.48, 1.27, 1.59, 1.43, 1.69),
  lower = c(0.84, 1.32, 0.81, 1.41, 1.14, 1.11, 1.12, 0.93, 1.24, 1.13, 1.15),
  upper = c(2.72, 2.24, 1.48, 3.06, 2.31, 1.92, 1.95, 1.75, 2.04, 1.81, 2.47),
  p_value = c(0.93, NA, 0.04, NA, NA, 0.83, NA, 0.33, NA, 0.33, NA)
)

df$HR_CI <- sprintf("%.2f (%.2f–%.2f)", df$HR, df$lower, df$upper)
df
             variable     characteristic   HR lower upper p_value
1           Income, $             >50000 1.51  0.84  2.72    0.93
2           Income, $             <15000 1.72  1.32  2.24      NA
3             Smoking              Never 1.10  0.81  1.48    0.04
4             Smoking             Former 2.07  1.41  3.06      NA
5             Smoking            Current 1.62  1.14  2.31      NA
6             Obesity                 No 1.46  1.11  1.92    0.83
7             Obesity                Yes 1.48  1.12  1.95      NA
8                 Sex               Male 1.27  0.93  1.75    0.33
9                 Sex             Female 1.59  1.24  2.04      NA
10 Race and ethnicity Non-Hispanic Black 1.43  1.13  1.81    0.33
11 Race and ethnicity Non-Hispanic White 1.69  1.15  2.47      NA
              HR_CI
1  1.51 (0.84–2.72)
2  1.72 (1.32–2.24)
3  1.10 (0.81–1.48)
4  2.07 (1.41–3.06)
5  1.62 (1.14–2.31)
6  1.46 (1.11–1.92)
7  1.48 (1.12–1.95)
8  1.27 (0.93–1.75)
9  1.59 (1.24–2.04)
10 1.43 (1.13–1.81)
11 1.69 (1.15–2.47)

Reproducing the Plot in R

The meta package is designed for meta-analysis, but its forest() function is perfect for any list of estimates once pooling is turned off.

library(meta)

all <- metagen(
  TE = log(df$HR),
  lower = log(df$lower),
  upper = log(df$upper),
  subgroup = df$variable,
  data = df,
  sm = "HR",
  common = FALSE,
  overall = FALSE,
  random = FALSE,
  backtransf = TRUE
)

forest(all,
       leftcols  = c("characteristic", "HR_CI"),
       leftlabs  = c("Characteristic", "HR (95% CI)"),
       rightcols = "p_value",
       rightlabs = "P for interaction",
       col.square = "darkgreen",
       weight.study = "same",
       label.left  = "Favours no colorectal cancer",
       label.right = "Favours colorectal cancer",
       fontsize=7, squaresize = 0.6,
       spacing = 0.7,
       subgroup.name = ""
       )

In Python? No problem

Let’s try to reproduce it in python. First lets bring the R data frame from above and copy it in our python interpreter.

I use reticulate package to run both R and python codes.

library(reticulate)
knitr::opts_chunk$set(echo = TRUE)
# If needed, pick a python env:
# reticulate::use_python("C:/Path/to/python.exe", required = TRUE)
# reticulate::py_install(c("pandas","matplotlib","numpy"))

Now let’s bring that R dataframe into our python interpreter.

import pandas as pd, numpy as np
import matplotlib.pyplot as plt
import matplotlib.transforms as transforms

# bring R's df into Python
python_df = r.df.copy()
python_df.head()
    variable characteristic    HR  lower  upper  p_value             HR_CI
0  Income, $         >50000  1.51   0.84   2.72     0.93  1.51 (0.84–2.72)
1  Income, $         <15000  1.72   1.32   2.24      NaN  1.72 (1.32–2.24)
2    Smoking          Never  1.10   0.81   1.48     0.04  1.10 (0.81–1.48)
3    Smoking         Former  2.07   1.41   3.06      NaN  2.07 (1.41–3.06)
4    Smoking        Current  1.62   1.14   2.31      NaN  1.62 (1.14–2.31)

Plot in python: we will use the forestplot library from python. You can take a look at the vignettes for more details.


import forestplot as fp

ax = fp.forestplot(python_df,  # the dataframe with results data
              estimate="HR",  # col containing estimated effect size 
              ll="lower", hl="upper",  # columns containing conf. int. lower and higher limits
              varlabel="characteristic",  # column containing variable label
              capitalize="capitalize",  # Capitalize labels
              groupvar="variable",  # Add variable groupings 
              # group ordering
              annote=["HR_CI"],  # columns to report on left of plot
              annoteheaders=["HR (95%CI)"],  # ^corresponding headers
            
              group_order=["Income, $", "Smoking", "Obesity", "Sex", "Race and ethnicity"],
              table=True,  # Format as a table
              sort=True,  # sort in ascending order (sorts within group if group is specified)               
              pval="p_value",  # Column of p-value to be reported on right
              color_alt_rows=True,  # Gray alternate rows
              mcolor = "darkgreen",#color of the square
              figsize=(8, 6), #Controls the size of the figure,
              xticks=[0.5,1, 2],
              ylabel="HR (95% CI)",  # ylabel to print
              **{"ylabel1_size": 11}  # control size of printed ylabel
              )
 # add vertical line at x=1
ax.axvline(x=1, linewidth=1.5, color='black',linestyle='-') 
plt.tight_layout()
plt.show()
Traceback (most recent call last):
  File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\axis.py", line 1506, in convert_units
    ret = self.converter.convert(x, self.units, self)
  File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\category.py", line 49, in convert
    raise ValueError(
ValueError: Missing category information for StrCategoryConverter; this might be caused by unintendedly mixing categorical and numeric data

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\backends\backend_qt.py", line 477, in _draw_idle
    self.draw()
  File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\backends\backend_agg.py", line 436, in draw
    self.figure.draw(self.renderer)
  File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\artist.py", line 73, in draw_wrapper
    result = draw(artist, renderer, *args, **kwargs)
  File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\artist.py", line 50, in draw_wrapper
    return draw(artist, renderer)
  File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\figure.py", line 2837, in draw
    mimage._draw_list_compositing_images(
  File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\image.py", line 132, in _draw_list_compositing_images
    a.draw(renderer)
  File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\artist.py", line 50, in draw_wrapper
    return draw(artist, renderer)
  File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\axes\_base.py", line 3091, in draw
    mimage._draw_list_compositing_images(
  File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\image.py", line 132, in _draw_list_compositing_images
    a.draw(renderer)
  File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\artist.py", line 50, in draw_wrapper
    return draw(artist, renderer)
  File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\text.py", line 685, in draw
    bbox, info, descent = self._get_layout(renderer)
  File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\text.py", line 296, in _get_layout
    key = self._get_layout_cache_key(renderer=renderer)
  File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\text.py", line 280, in _get_layout_cache_key
    x, y = self.get_unitless_position()
  File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\text.py", line 831, in get_unitless_position
    y = float(self.convert_yunits(self._y))
  File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\artist.py", line 264, in convert_yunits
    return ax.yaxis.convert_units(y)
  File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\axis.py", line 1508, in convert_units
    raise munits.ConversionError('Failed to convert value(s) to axis '
matplotlib.units.ConversionError: Failed to convert value(s) to axis units: '  Non-hispanic black  1.43 (1.13–1.81)'

Take-Home Message

Visuals persuade. A forest plot conveys magnitude, direction, and uncertainty at a glance.

Readers remember pictures, not tables.

If your model fits in a table, it fits in a figure—and your audience will thank you.

So the next time you’re tempted to drop a 12-column regression table into your manuscript, ask yourself: why not a forest plot?

Stop drowning readers in tables—start communicating with figures.

Image credit: Figure 1 from the cited JAMA article.

If you have enjoyed reading this blog post, consider subscribing for upcoming posts.

Subscribe

* indicates required