In this post I reproduce an elegant JAMA forest plot using R and Python.
Every week I open papers that bury their main results in wall-to-wall regression tables: dozens of rows, tiny fonts, and confidence intervals that may even require a ruler and patience.
Let’s see this elegant forest plot presented in this article: Associations Between Diabetes and Incident Colorectal Cancer. It looks nice and easy to read, right? So what is stopping many from presenting regression results like this—clear, visual, and instantly interpretable?

A single figure tells the story: which factors matter, the direction of effect, and the uncertainty—no squinting at page-long tables.
This figure contains only a handful of rows, so I simply typed the numbers by hand from the article and supplement.
That guarantees 100 % fidelity.
For larger figures you could automate extraction.
The tesseract package in R can OCR a PDF or PNG:
HR Favors no | Favors Pfor
Characteristic (95% Cl) colorectal cancer colorectal cancer interaction
Income, $ 3B
>50000 1.51 (0.84-2.72) —-—.
<15000 1,72 (1.32-2.24) —
‘Smoking status 04
Never 1.10 (0.81-1.48) —
Former 2.07 (1.41-3.06) —.—
Current 1.62 (1.14-2.31) —
Obesity (BMI 230) 83
No 1.46 (1.11-1.92) —
Yes 1.48 (1.12 -1.95) —
Sex 33
Male 1,27 (0.93-1.75) —
Female 1,59 (1.24-2.04) aed
Race and ethnicity 33
Non-Hispanic Black 1.43 (1.13-1.81) Se
Non-Hispanic White 1.69 (1.15-2.47) ——
rr nn)
HR (95% Cl)
…but for eleven lines of data, manual entry wins on speed and accuracy.
df <- data.frame(
variable = c("Income, $", "Income, $", "Smoking", "Smoking", "Smoking",
"Obesity", "Obesity", "Sex", "Sex",
"Race and ethnicity", "Race and ethnicity"),
characteristic = c(">50000", "<15000", "Never", "Former", "Current",
"No", "Yes", "Male", "Female",
"Non-Hispanic Black", "Non-Hispanic White"),
HR = c(1.51, 1.72, 1.10, 2.07, 1.62, 1.46, 1.48, 1.27, 1.59, 1.43, 1.69),
lower = c(0.84, 1.32, 0.81, 1.41, 1.14, 1.11, 1.12, 0.93, 1.24, 1.13, 1.15),
upper = c(2.72, 2.24, 1.48, 3.06, 2.31, 1.92, 1.95, 1.75, 2.04, 1.81, 2.47),
p_value = c(0.93, NA, 0.04, NA, NA, 0.83, NA, 0.33, NA, 0.33, NA)
)
df$HR_CI <- sprintf("%.2f (%.2f–%.2f)", df$HR, df$lower, df$upper)
df
variable characteristic HR lower upper p_value
1 Income, $ >50000 1.51 0.84 2.72 0.93
2 Income, $ <15000 1.72 1.32 2.24 NA
3 Smoking Never 1.10 0.81 1.48 0.04
4 Smoking Former 2.07 1.41 3.06 NA
5 Smoking Current 1.62 1.14 2.31 NA
6 Obesity No 1.46 1.11 1.92 0.83
7 Obesity Yes 1.48 1.12 1.95 NA
8 Sex Male 1.27 0.93 1.75 0.33
9 Sex Female 1.59 1.24 2.04 NA
10 Race and ethnicity Non-Hispanic Black 1.43 1.13 1.81 0.33
11 Race and ethnicity Non-Hispanic White 1.69 1.15 2.47 NA
HR_CI
1 1.51 (0.84–2.72)
2 1.72 (1.32–2.24)
3 1.10 (0.81–1.48)
4 2.07 (1.41–3.06)
5 1.62 (1.14–2.31)
6 1.46 (1.11–1.92)
7 1.48 (1.12–1.95)
8 1.27 (0.93–1.75)
9 1.59 (1.24–2.04)
10 1.43 (1.13–1.81)
11 1.69 (1.15–2.47)
The meta package is designed for meta-analysis, but its forest() function is perfect for any list of estimates once pooling is turned off.
library(meta)
all <- metagen(
TE = log(df$HR),
lower = log(df$lower),
upper = log(df$upper),
subgroup = df$variable,
data = df,
sm = "HR",
common = FALSE,
overall = FALSE,
random = FALSE,
backtransf = TRUE
)
forest(all,
leftcols = c("characteristic", "HR_CI"),
leftlabs = c("Characteristic", "HR (95% CI)"),
rightcols = "p_value",
rightlabs = "P for interaction",
col.square = "darkgreen",
weight.study = "same",
label.left = "Favours no colorectal cancer",
label.right = "Favours colorectal cancer",
fontsize=7, squaresize = 0.6,
spacing = 0.7,
subgroup.name = ""
)

Let’s try to reproduce it in python. First lets bring the R data frame from above and copy it in our python interpreter.
I use reticulate package to run both R and python codes.
library(reticulate)
knitr::opts_chunk$set(echo = TRUE)
# If needed, pick a python env:
# reticulate::use_python("C:/Path/to/python.exe", required = TRUE)
# reticulate::py_install(c("pandas","matplotlib","numpy"))
Now let’s bring that R dataframe into our python interpreter.
import pandas as pd, numpy as np
import matplotlib.pyplot as plt
import matplotlib.transforms as transforms
# bring R's df into Python
python_df = r.df.copy()
python_df.head() variable characteristic HR lower upper p_value HR_CI
0 Income, $ >50000 1.51 0.84 2.72 0.93 1.51 (0.84–2.72)
1 Income, $ <15000 1.72 1.32 2.24 NaN 1.72 (1.32–2.24)
2 Smoking Never 1.10 0.81 1.48 0.04 1.10 (0.81–1.48)
3 Smoking Former 2.07 1.41 3.06 NaN 2.07 (1.41–3.06)
4 Smoking Current 1.62 1.14 2.31 NaN 1.62 (1.14–2.31)
Plot in python: we will use the forestplot library from python. You can take a look at the vignettes for more details.
import forestplot as fp
ax = fp.forestplot(python_df, # the dataframe with results data
estimate="HR", # col containing estimated effect size
ll="lower", hl="upper", # columns containing conf. int. lower and higher limits
varlabel="characteristic", # column containing variable label
capitalize="capitalize", # Capitalize labels
groupvar="variable", # Add variable groupings
# group ordering
annote=["HR_CI"], # columns to report on left of plot
annoteheaders=["HR (95%CI)"], # ^corresponding headers
group_order=["Income, $", "Smoking", "Obesity", "Sex", "Race and ethnicity"],
table=True, # Format as a table
sort=True, # sort in ascending order (sorts within group if group is specified)
pval="p_value", # Column of p-value to be reported on right
color_alt_rows=True, # Gray alternate rows
mcolor = "darkgreen",#color of the square
figsize=(8, 6), #Controls the size of the figure,
xticks=[0.5,1, 2],
ylabel="HR (95% CI)", # ylabel to print
**{"ylabel1_size": 11} # control size of printed ylabel
)
# add vertical line at x=1
ax.axvline(x=1, linewidth=1.5, color='black',linestyle='-')
plt.tight_layout()
plt.show()Traceback (most recent call last):
File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\axis.py", line 1506, in convert_units
ret = self.converter.convert(x, self.units, self)
File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\category.py", line 49, in convert
raise ValueError(
ValueError: Missing category information for StrCategoryConverter; this might be caused by unintendedly mixing categorical and numeric data
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\backends\backend_qt.py", line 477, in _draw_idle
self.draw()
File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\backends\backend_agg.py", line 436, in draw
self.figure.draw(self.renderer)
File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\artist.py", line 73, in draw_wrapper
result = draw(artist, renderer, *args, **kwargs)
File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\artist.py", line 50, in draw_wrapper
return draw(artist, renderer)
File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\figure.py", line 2837, in draw
mimage._draw_list_compositing_images(
File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\image.py", line 132, in _draw_list_compositing_images
a.draw(renderer)
File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\artist.py", line 50, in draw_wrapper
return draw(artist, renderer)
File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\axes\_base.py", line 3091, in draw
mimage._draw_list_compositing_images(
File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\image.py", line 132, in _draw_list_compositing_images
a.draw(renderer)
File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\artist.py", line 50, in draw_wrapper
return draw(artist, renderer)
File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\text.py", line 685, in draw
bbox, info, descent = self._get_layout(renderer)
File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\text.py", line 296, in _get_layout
key = self._get_layout_cache_key(renderer=renderer)
File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\text.py", line 280, in _get_layout_cache_key
x, y = self.get_unitless_position()
File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\text.py", line 831, in get_unitless_position
y = float(self.convert_yunits(self._y))
File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\artist.py", line 264, in convert_yunits
return ax.yaxis.convert_units(y)
File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\axis.py", line 1508, in convert_units
raise munits.ConversionError('Failed to convert value(s) to axis '
matplotlib.units.ConversionError: Failed to convert value(s) to axis units: ' Non-hispanic black 1.43 (1.13–1.81)'

Take-Home Message
Visuals persuade. A forest plot conveys magnitude, direction, and uncertainty at a glance.
Readers remember pictures, not tables.
If your model fits in a table, it fits in a figure—and your audience will thank you.
So the next time you’re tempted to drop a 12-column regression table into your manuscript, ask yourself: why not a forest plot?
Stop drowning readers in tables—start communicating with figures.
Image credit: Figure 1 from the cited JAMA article.
If you have enjoyed reading this blog post, consider subscribing for upcoming posts.