In this post I reproduce an elegant JAMA forest plot—originally created in SAS—using R.
Every week I open papers that bury their main results in wall-to-wall regression tables: dozens of rows, tiny fonts, and confidence intervals that may even require a ruler and patience.
Let’s see this elegant forest plot presented in this article: Associations Between Diabetes and Incident Colorectal Cancer. It looks nice and easy to read, right? So what is stopping many from presenting regression results like this—clear, visual, and instantly interpretable?
A single figure tells the story: which factors matter, the direction of effect, and the uncertainty—no squinting at page-long tables.
This figure contains only a handful of rows, so I simply typed the numbers by hand from the article and supplement.
That guarantees 100 % fidelity.
For larger figures you could automate extraction.
The tesseract
package in R can OCR a PDF or PNG:
HR Favors no | Favors Pfor
Characteristic (95% Cl) colorectal cancer colorectal cancer interaction
Income, $ 3B
>50000 1.51 (0.84-2.72) —-—.
<15000 1,72 (1.32-2.24) —
‘Smoking status 04
Never 1.10 (0.81-1.48) —
Former 2.07 (1.41-3.06) —.—
Current 1.62 (1.14-2.31) —
Obesity (BMI 230) 83
No 1.46 (1.11-1.92) —
Yes 1.48 (1.12 -1.95) —
Sex 33
Male 1,27 (0.93-1.75) —
Female 1,59 (1.24-2.04) aed
Race and ethnicity 33
Non-Hispanic Black 1.43 (1.13-1.81) Se
Non-Hispanic White 1.69 (1.15-2.47) ——
rr nn)
HR (95% Cl)
…but for eleven lines of data, manual entry wins on speed and accuracy.
df <- data.frame(
variable = c("Income, $", "Income, $", "Smoking", "Smoking", "Smoking",
"Obesity", "Obesity", "Sex", "Sex",
"Race and ethnicity", "Race and ethnicity"),
characteristic = c(">50000", "<15000", "Never", "Former", "Current",
"No", "Yes", "Male", "Female",
"Non-Hispanic Black", "Non-Hispanic White"),
HR = c(1.51, 1.72, 1.10, 2.07, 1.62, 1.46, 1.48, 1.27, 1.59, 1.43, 1.69),
lower = c(0.84, 1.32, 0.81, 1.41, 1.14, 1.11, 1.12, 0.93, 1.24, 1.13, 1.15),
upper = c(2.72, 2.24, 1.48, 3.06, 2.31, 1.92, 1.95, 1.75, 2.04, 1.81, 2.47),
p_value = c(0.93, NA, 0.04, NA, NA, 0.83, NA, 0.33, NA, 0.33, NA)
)
df$HR_CI <- sprintf("%.2f (%.2f–%.2f)", df$HR, df$lower, df$upper)
df
variable characteristic HR lower upper p_value
1 Income, $ >50000 1.51 0.84 2.72 0.93
2 Income, $ <15000 1.72 1.32 2.24 NA
3 Smoking Never 1.10 0.81 1.48 0.04
4 Smoking Former 2.07 1.41 3.06 NA
5 Smoking Current 1.62 1.14 2.31 NA
6 Obesity No 1.46 1.11 1.92 0.83
7 Obesity Yes 1.48 1.12 1.95 NA
8 Sex Male 1.27 0.93 1.75 0.33
9 Sex Female 1.59 1.24 2.04 NA
10 Race and ethnicity Non-Hispanic Black 1.43 1.13 1.81 0.33
11 Race and ethnicity Non-Hispanic White 1.69 1.15 2.47 NA
HR_CI
1 1.51 (0.84–2.72)
2 1.72 (1.32–2.24)
3 1.10 (0.81–1.48)
4 2.07 (1.41–3.06)
5 1.62 (1.14–2.31)
6 1.46 (1.11–1.92)
7 1.48 (1.12–1.95)
8 1.27 (0.93–1.75)
9 1.59 (1.24–2.04)
10 1.43 (1.13–1.81)
11 1.69 (1.15–2.47)
The meta package is designed for meta-analysis, but its forest() function is perfect for any list of estimates once pooling is turned off.
library(meta)
all <- metagen(
TE = log(df$HR),
lower = log(df$lower),
upper = log(df$upper),
subgroup = df$variable,
data = df,
sm = "HR",
common = FALSE,
overall = FALSE,
random = FALSE,
backtransf = TRUE
)
forest(all,
leftcols = c("characteristic", "HR_CI"),
leftlabs = c("Characteristic", "HR (95% CI)"),
rightcols = "p_value",
rightlabs = "P for interaction",
col.square = "darkgreen",
weight.study = "same",
label.left = "Favours no colorectal cancer",
label.right = "Favours colorectal cancer",
fontsize=7, squaresize = 0.6,
spacing = 0.7,
subgroup.name = ""
)
Take-Home Message
Visuals persuade. A forest plot conveys magnitude, direction, and uncertainty at a glance.
Readers remember pictures, not tables.
If your model fits in a table, it fits in a figure—and your audience will thank you.
So the next time you’re tempted to drop a 12-column regression table into your manuscript, ask yourself: why not a forest plot?
Stop drowning readers in tables—start communicating with figures.
Image credit: Figure 1 from the cited JAMA article.
If you have enjoyed reading this blog post, consider subscribing for upcoming posts.