Why Are We Still Using Endless Tables?

Every week I open papers that bury their main results in wall-to-wall regression tables: dozens of rows, tiny fonts, and confidence intervals that may even require a ruler and patience.

Let’s see this elegant forest plot presented in this article: Associations Between Diabetes and Incident Colorectal Cancer. It looks nice and easy to read, right? So what is stopping many from presenting regression results like this—clear, visual, and instantly interpretable?

A single figure tells the story: which factors matter, the direction of effect, and the uncertainty—no squinting at page-long tables.

Getting the Data

This figure contains only a handful of rows, so I simply typed the numbers by hand from the article and supplement.
That guarantees 100 % fidelity.

For larger figures you could automate extraction.
The tesseract package in R can OCR a PDF or PNG:

library(tesseract)
txt <- ocr("jama.jpg")
cat(txt)

HR Favors no | Favors Pfor
Characteristic (95% Cl) colorectal cancer colorectal cancer interaction
Income, $ 3B
>50000 1.51 (0.84-2.72) —-—.
<15000 1,72 (1.32-2.24) —
‘Smoking status 04
Never 1.10 (0.81-1.48) —
Former 2.07 (1.41-3.06) —.—
Current 1.62 (1.14-2.31) —
Obesity (BMI 230) 83
No 1.46 (1.11-1.92) —
Yes 1.48 (1.12 -1.95) —
Sex 33
Male 1,27 (0.93-1.75) —
Female 1,59 (1.24-2.04) aed
Race and ethnicity 33
Non-Hispanic Black 1.43 (1.13-1.81) Se
Non-Hispanic White 1.69 (1.15-2.47) ——
rr nn)
HR (95% Cl)

…but for eleven lines of data, manual entry wins on speed and accuracy.

df <- data.frame(
  variable = c("Income, $", "Income, $", "Smoking", "Smoking", "Smoking",
               "Obesity", "Obesity", "Sex", "Sex",
               "Race and ethnicity", "Race and ethnicity"),
  characteristic = c(">50000", "<15000", "Never", "Former", "Current",
                     "No", "Yes", "Male", "Female",
                     "Non-Hispanic Black", "Non-Hispanic White"),
  HR    = c(1.51, 1.72, 1.10, 2.07, 1.62, 1.46, 1.48, 1.27, 1.59, 1.43, 1.69),
  lower = c(0.84, 1.32, 0.81, 1.41, 1.14, 1.11, 1.12, 0.93, 1.24, 1.13, 1.15),
  upper = c(2.72, 2.24, 1.48, 3.06, 2.31, 1.92, 1.95, 1.75, 2.04, 1.81, 2.47),
  p_value = c(0.93, NA, 0.04, NA, NA, 0.83, NA, 0.33, NA, 0.33, NA)
)

df$HR_CI <- sprintf("%.2f (%.2f–%.2f)", df$HR, df$lower, df$upper)
df

             variable     characteristic   HR lower upper p_value
1           Income, $             >50000 1.51  0.84  2.72    0.93
2           Income, $             <15000 1.72  1.32  2.24      NA
3             Smoking              Never 1.10  0.81  1.48    0.04
4             Smoking             Former 2.07  1.41  3.06      NA
5             Smoking            Current 1.62  1.14  2.31      NA
6             Obesity                 No 1.46  1.11  1.92    0.83
7             Obesity                Yes 1.48  1.12  1.95      NA
8                 Sex               Male 1.27  0.93  1.75    0.33
9                 Sex             Female 1.59  1.24  2.04      NA
10 Race and ethnicity Non-Hispanic Black 1.43  1.13  1.81    0.33
11 Race and ethnicity Non-Hispanic White 1.69  1.15  2.47      NA
              HR_CI
1  1.51 (0.84–2.72)
2  1.72 (1.32–2.24)
3  1.10 (0.81–1.48)
4  2.07 (1.41–3.06)
5  1.62 (1.14–2.31)
6  1.46 (1.11–1.92)
7  1.48 (1.12–1.95)
8  1.27 (0.93–1.75)
9  1.59 (1.24–2.04)
10 1.43 (1.13–1.81)
11 1.69 (1.15–2.47)

Reproducing the Plot in R

The meta package is designed for meta-analysis, but its forest() function is perfect for any list of estimates once pooling is turned off.

library(meta)

all <- metagen(
  TE = log(df$HR),
  lower = log(df$lower),
  upper = log(df$upper),
  subgroup = df$variable,
  data = df,
  sm = "HR",
  common = FALSE,
  overall = FALSE,
  random = FALSE,
  backtransf = TRUE
)

forest(all,
       leftcols  = c("characteristic", "HR_CI"),
       leftlabs  = c("Characteristic", "HR (95% CI)"),
       rightcols = "p_value",
       rightlabs = "P for interaction",
       col.square = "darkgreen",
       weight.study = "same",
       label.left  = "Favours no colorectal cancer",
       label.right = "Favours colorectal cancer",
       fontsize=7, squaresize = 0.6,
       spacing = 0.7,
       subgroup.name = ""
       )

In Python? No problem

Let’s try to reproduce it in python. First lets bring the R data frame from above and copy it in our python interpreter.

I use reticulate package to run both R and python codes.

library(reticulate)
knitr::opts_chunk$set(echo = TRUE)
# If needed, pick a python env:
# reticulate::use_python("C:/Path/to/python.exe", required = TRUE)
# reticulate::py_install(c("pandas","matplotlib","numpy"))

Now let’s bring that R dataframe into our python interpreter.

import pandas as pd, numpy as np
import matplotlib.pyplot as plt
import matplotlib.transforms as transforms

# bring R's df into Python
python_df = r.df.copy()
python_df.head()

    variable characteristic    HR  lower  upper  p_value             HR_CI
0  Income, $         >50000  1.51   0.84   2.72     0.93  1.51 (0.84–2.72)
1  Income, $         <15000  1.72   1.32   2.24      NaN  1.72 (1.32–2.24)
2    Smoking          Never  1.10   0.81   1.48     0.04  1.10 (0.81–1.48)
3    Smoking         Former  2.07   1.41   3.06      NaN  2.07 (1.41–3.06)
4    Smoking        Current  1.62   1.14   2.31      NaN  1.62 (1.14–2.31)

Plot in python: we will use the forestplot library from python. You can take a look at the vignettes for more details.


import forestplot as fp

ax = fp.forestplot(python_df,  # the dataframe with results data
              estimate="HR",  # col containing estimated effect size 
              ll="lower", hl="upper",  # columns containing conf. int. lower and higher limits
              varlabel="characteristic",  # column containing variable label
              capitalize="capitalize",  # Capitalize labels
              groupvar="variable",  # Add variable groupings 
              # group ordering
              annote=["HR_CI"],  # columns to report on left of plot
              annoteheaders=["HR (95%CI)"],  # ^corresponding headers
            
              group_order=["Income, $", "Smoking", "Obesity", "Sex", "Race and ethnicity"],
              table=True,  # Format as a table
              sort=True,  # sort in ascending order (sorts within group if group is specified)               
              pval="p_value",  # Column of p-value to be reported on right
              color_alt_rows=True,  # Gray alternate rows
              mcolor = "darkgreen",#color of the square
              figsize=(8, 6), #Controls the size of the figure,
              xticks=[0.5,1, 2],
              ylabel="HR (95% CI)",  # ylabel to print
              **{"ylabel1_size": 11}  # control size of printed ylabel
              )
 # add vertical line at x=1
ax.axvline(x=1, linewidth=1.5, color='black',linestyle='-') 
plt.tight_layout()
plt.show()

Traceback (most recent call last):
  File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\axis.py", line 1506, in convert_units
    ret = self.converter.convert(x, self.units, self)
  File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\category.py", line 49, in convert
    raise ValueError(
ValueError: Missing category information for StrCategoryConverter; this might be caused by unintendedly mixing categorical and numeric data

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\backends\backend_qt.py", line 477, in _draw_idle
    self.draw()
  File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\backends\backend_agg.py", line 436, in draw
    self.figure.draw(self.renderer)
  File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\artist.py", line 73, in draw_wrapper
    result = draw(artist, renderer, *args, **kwargs)
  File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\artist.py", line 50, in draw_wrapper
    return draw(artist, renderer)
  File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\figure.py", line 2837, in draw
    mimage._draw_list_compositing_images(
  File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\image.py", line 132, in _draw_list_compositing_images
    a.draw(renderer)
  File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\artist.py", line 50, in draw_wrapper
    return draw(artist, renderer)
  File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\axes\_base.py", line 3091, in draw
    mimage._draw_list_compositing_images(
  File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\image.py", line 132, in _draw_list_compositing_images
    a.draw(renderer)
  File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\artist.py", line 50, in draw_wrapper
    return draw(artist, renderer)
  File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\text.py", line 685, in draw
    bbox, info, descent = self._get_layout(renderer)
  File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\text.py", line 296, in _get_layout
    key = self._get_layout_cache_key(renderer=renderer)
  File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\text.py", line 280, in _get_layout_cache_key
    x, y = self.get_unitless_position()
  File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\text.py", line 831, in get_unitless_position
    y = float(self.convert_yunits(self._y))
  File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\artist.py", line 264, in convert_yunits
    return ax.yaxis.convert_units(y)
  File "C:\Users\mihir\ANACON~3\lib\site-packages\matplotlib\axis.py", line 1508, in convert_units
    raise munits.ConversionError('Failed to convert value(s) to axis '
matplotlib.units.ConversionError: Failed to convert value(s) to axis units: '  Non-hispanic black  1.43 (1.13–1.81)'

Take-Home Message

Visuals persuade. A forest plot conveys magnitude, direction, and uncertainty at a glance.

Readers remember pictures, not tables.

If your model fits in a table, it fits in a figure—and your audience will thank you.

So the next time you’re tempted to drop a 12-column regression table into your manuscript, ask yourself: why not a forest plot?

Stop drowning readers in tables—start communicating with figures.

Image credit: Figure 1 from the cited JAMA article.

If you have enjoyed reading this blog post, consider subscribing for upcoming posts.

Still presenting regression results in tables? why not forest plots?

Why Are We Still Using Endless Tables?

Getting the Data

Reproducing the Plot in R

In Python? No problem

Subscribe