In this post, I will go through a basic introduction to the Inverse Probability Weighting and Marginal Structural Models for estimating the causal effects of a treatment on an outcome.
In my previous post I demonstrated, propensity score matching method for covariate balancing and estimating the average causal effects of treatments from observational studies. we have seen treatment assignment in observational setting is no random and this creates confounding. The propensity score effectively summarizes multiple confounders into a single scalar value. That scalar value(X) can help us achieve a matched data sets. While finding the matched observations, we have to discard unmatched individuals from our sample. Therefore the data sets can be much lower than the original unmatched data sets leading to: reduced precision. weaker generalizability,and possibly losing rare treatment subgroups.
In contrast to propensity score matching, Inverse Probability of Treatment Weighting (IPTW) offers an attractive solution and avoids discarding subjects. Instead of removing unmatched individuals, IPTW re-weights each observation by the inverse of its probability of receiving the treatment (1/propensity score value for the treated and 1/(1-propensity score value for control subjects)).
In addition, observational studies, treatment is often associated with prognosis even after conditioning on baseline covariates. This is especially problematic in longitudinal settings, where treatment and covariates both change over time and influence each other.
Classical regression adjustment can fail in the presence of treatment–covariate feedback. Adjusting for time varying covariates in tradional regression adjustments may block part of the treatment effect and introduce bias. Inverse Probability of Treatment Weighting (IPTW) and Marginal Structural Models (MSM) offer an effective method to tackle this bias. In this post, we focus on the single time point case, which will illustrate the key ideas. The extension to time varying treatment follows the same logic.
Let
The causal effects of interest are, for example:
Average treatment effect (ATE)
\[
ATE = E[Y(1) - Y(0)]
\]
Average treatment effect on the treated (ATT)
\[
ATT = E[Y(1) - Y(0) \mid T = 1]
\]
In the single time point setting, the key assumption is again conditional exchangeability:
\[ Y(1), Y(0) \perp T \mid X \]
The propensity score is
\[ e(X) = P(T = 1 \mid X) \]
IPTW will help us create a weighted pseudo population in which treatment is independent of measured covariates. For the ATE, the basic unstabilized weights are
\[ w_i = \begin{cases} \frac{1}{e(X_i)} & \text{if } T_i = 1 \\ \frac{1}{1 - e(X_i)} & \text{if } T_i = 0 \end{cases} \]
In this pseudo population, treated and untreated individuals have the same distribution of the scalar \(X\), so simple weighted comparisons of outcomes can be unbiased for the marginal causal effect, under the usual assumptions.
In practice, we often use stabilised weights to reduce variance, and we often truncate extreme weights to improve numerical stability.
A Marginal Structural Model (MSM) models the relationship between treatment and potential outcomes at the population level. For a binary treatment and outcome \(Y\), a simple MSM can be modelled as
\[ g\{ E[Y^a] \} = \beta_0 + \beta_1 a \]
where
We fit MSM using weighted regression with IPTW. The weights make treatment independent of the covariates (X), so the regression captures the marginal causal effect rather than a conditional association.
For a continuous outcome, we use a Gaussian model with identity link and interpret \(\beta_1\) as a causal mean difference (MD).
Let’s draw a DAG for our example dataset. We use a right heart catheterization (RHC) example. Let treatment indicate whether a patient received RHC in the ICU, and died indicate death. Several baseline characteristics affect both the decision to perform RHC and the risk of death.

The DAG shows that comorbidities and severity markers are confounders for the effect of RHC on mortality. IPTW uses these variables to construct a pseudo population where treatment is independent of these confounders. Now let’s proceed to our analysis.
Load data and prepare variables
# install.packages("tableone", repos = "https://cloud.r-project.org")
# install.packages("ipw", repos = "https://cloud.r-project.org")
# install.packages("sandwich", repos = "https://cloud.r-project.org")
# install.packages("survey", repos = "https://cloud.r-project.org")
# install.packages("cobalt", repos = "https://cloud.r-project.org")
library(dplyr)
library(tableone) # for showing the SMD values
library(ipw) #for running iptw
library(sandwich) # robust variance
library(survey)
library(cobalt)
# Load example data
load(url("https://hbiostat.org/data/repo/rhc.sav"))
# Create binary indicators and numeric variables
mydata <- rhc %>%
mutate(
ARF = as.numeric(cat1 == "ARF"),
CHF = as.numeric(cat1 == "CHF"),
Cirr = as.numeric(cat1 == "Cirrhosis"),
colcan = as.numeric(cat1 == "Colon Cancer"),
Coma = as.numeric(cat1 == "Coma"),
lungcan = as.numeric(cat1 == "Lung Cancer"),
MOSF = as.numeric(cat1 == "MOSF w/Malignancy"),
sepsis = as.numeric(cat1 == "MOSF w/Sepsis"),
female = as.numeric(sex == "Female"),
died = as.numeric(death == "Yes"),
treatment = as.numeric(swang1 == "RHC"),
meanbp1 = rhc$meanbp1,
aps = rhc$aps1
) %>%
select(
ARF, CHF, Cirr, colcan, Coma, lungcan, MOSF, sepsis,
age, female, meanbp1, aps, treatment, died
)
Unweighted baseline characteristics
xvars <- c("ARF", "CHF", "Cirr", "colcan", "Coma",
"lungcan", "MOSF", "sepsis",
"age", "female", "meanbp1", "aps")
table_unw <- CreateTableOne(vars = xvars, strata = "treatment",
data = mydata, test = FALSE)
print(table_unw, smd = TRUE)
Stratified by treatment
0 1 SMD
n 3551 2184
ARF (mean (SD)) 0.45 (0.50) 0.42 (0.49) 0.059
CHF (mean (SD)) 0.07 (0.25) 0.10 (0.29) 0.095
Cirr (mean (SD)) 0.05 (0.22) 0.02 (0.15) 0.145
colcan (mean (SD)) 0.00 (0.04) 0.00 (0.02) 0.038
Coma (mean (SD)) 0.10 (0.29) 0.04 (0.20) 0.207
lungcan (mean (SD)) 0.01 (0.10) 0.00 (0.05) 0.095
MOSF (mean (SD)) 0.07 (0.25) 0.07 (0.26) 0.018
sepsis (mean (SD)) 0.15 (0.36) 0.32 (0.47) 0.415
female (mean (SD)) 0.46 (0.50) 0.41 (0.49) 0.093
As you can see from the SMDs, there is imbalance it some of our prognostic variables (SMD>0.1) between patients who did and did not receive RHC.
As we did in my previous tutorial, let’s now do the PSM model.
psmodel <- glm(treatment ~ age + female + meanbp1 +
ARF + CHF + Cirr + colcan +
Coma + lungcan + MOSF + sepsis,
family = binomial(link = "logit"),
data = mydata)
summary(psmodel)
Call:
glm(formula = treatment ~ age + female + meanbp1 + ARF + CHF +
Cirr + colcan + Coma + lungcan + MOSF + sepsis, family = binomial(link = "logit"),
data = mydata)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.7013 -1.0097 -0.6336 1.1814 2.4791
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.7299670 0.1997692 -3.654 0.000258 ***
age -0.0031374 0.0017289 -1.815 0.069567 .
female -0.1697903 0.0583574 -2.909 0.003620 **
meanbp1 -0.0109824 0.0008217 -13.366 < 2e-16 ***
ARF 1.2931956 0.1487784 8.692 < 2e-16 ***
CHF 1.6804704 0.1715672 9.795 < 2e-16 ***
Cirr 0.5234506 0.2181458 2.400 0.016416 *
colcan 0.0295468 1.0985361 0.027 0.978542
Coma 0.7013451 0.1854937 3.781 0.000156 ***
lungcan -0.0869570 0.5039331 -0.173 0.863000
MOSF 1.3046587 0.1772705 7.360 1.84e-13 ***
sepsis 2.0433604 0.1545437 13.222 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 7621.4 on 5734 degrees of freedom
Residual deviance: 6983.6 on 5723 degrees of freedom
AIC: 7007.6
Number of Fisher Scoring iterations: 4
ps <- predict(psmodel, type = "response")
ps[1:5] #To see the propensity score vales for five observations
1 2 3 4 5
0.1977132 0.5514264 0.4095798 0.3900890 0.5954348
We now estimated the propensity score ps values for the observations which is the probability of receiving RHC treatment given our baseline covariates (age + female, meanbp1, ARF, CHF, Cirr , colcan ,Coma , lungcan, MOSF ,sepsis).
With the following code, we calculate the unstabilied weights. We calculate these weights by simply taking the reciprocal of the propensity(1/ps) scores for the the ones who didn’t receive the treatment and 1/(1-ps) for the those subjects who didn’t receive the treatment.
#Unstabilised ATE weights
w_iptw <- ifelse(mydata$treatment == 1,
1 / ps, 1/(1 - ps))
summary(w_iptw)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.046 1.405 1.721 2.001 2.280 21.606
These weights help us create a pseudo population in which treatment assignment is independent of the measured confounders, under our causal assumptions.
design_w <- svydesign(ids = ~1, data = mydata, weights = ~w_iptw)
table_w <- svyCreateTableOne(vars = xvars, strata = "treatment",
data = design_w, test = FALSE)
print(table_w, smd = TRUE)
Stratified by treatment
0 1 SMD
n 5732.49 5744.88
ARF (mean (SD)) 0.44 (0.50) 0.44 (0.50) 0.010
CHF (mean (SD)) 0.08 (0.27) 0.08 (0.27) 0.005
Cirr (mean (SD)) 0.04 (0.19) 0.04 (0.19) 0.001
colcan (mean (SD)) 0.00 (0.04) 0.00 (0.06) 0.042
Coma (mean (SD)) 0.08 (0.26) 0.07 (0.25) 0.023
lungcan (mean (SD)) 0.01 (0.08) 0.01 (0.09) 0.014
MOSF (mean (SD)) 0.07 (0.26) 0.07 (0.26) 0.004
sepsis (mean (SD)) 0.21 (0.41) 0.22 (0.41) 0.002
female (mean (SD)) 0.45 (0.50) 0.45 (0.50) 0.001
We can see the SMDs are now closer to zero which tells us that we have improved the balance.
# Using cobalt directly with weights
bal_iptw <- bal.tab(treatment ~ ARF + CHF + Cirr +
colcan + Coma +lungcan + MOSF + sepsis +
age + female + meanbp1 + aps,
data = mydata,
weights = w_iptw,
estimand = "ATE")
love.plot(bal_iptw, stats = "mean.diffs",
abs = TRUE, threshold = 0.1)

The love plot shows how IPTW reduces imbalance. Absolute mean difference below 0.1 are usually considered acceptable.
To estimate the causal risk RR, we run a regression of the outcome variable (died) as a function of the treatment variable. We add our iptw weights in the glm. After that we will already have the coefficients.
The vcovHC() from the sandwich package computes a heteroskedasticity-consistent variance–covariance matrix. It tels to use the Basic sandwitch method to calculate the SE. HC0works well and is default for large samples. For small samples, HC3 is often used. Once we have the SE of our estimate 𝞫 , calculating the causal RR and its 95%CI is straightforward.
glm_rr <- glm(died ~ treatment,
weights = w_iptw,
family = binomial(link = "log"),
data = mydata)
beta_rr <- coef(glm_rr)
SE_rr <- sqrt(diag(vcovHC(glm_rr, type = "HC0")))
causal_rr <- exp(beta_rr["treatment"])
lcl_rr <- exp(beta_rr["treatment"] - 1.96 * SE_rr["treatment"])
ucl_rr <- exp(beta_rr["treatment"] + 1.96 * SE_rr["treatment"])
c(RR = causal_rr, LCL = lcl_rr, UCL = ucl_rr)
RR.treatment LCL.treatment UCL.treatment
1.081764 1.036698 1.128790
Here exp(β1)(_1)exp(β1) is the marginal causal risk ratio for death comparing RHC to no RHC in the weighted pseudo population. You can see the treatment increase death by 8%.
To calculate the causal risk difference, we run similar analysis as above but we now set the link function to identity. The rest is simple.
glm_rd <- glm(died ~ treatment,
weights = w_iptw,
family = binomial(link = "identity"),
data = mydata)
beta_rd <- coef(glm_rd)
SE_rd <- sqrt(diag(vcovHC(glm_rd, type = "HC0")))
causal_rd <- beta_rd["treatment"]
lcl_rd <- beta_rd["treatment"] - 1.96 * SE_rd["treatment"]
ucl_rd <- beta_rd["treatment"] + 1.96 * SE_rd["treatment"]
c(RD = causal_rd, LCL = lcl_rd, UCL = ucl_rd)
RD.treatment LCL.treatment UCL.treatment
0.05154951 0.02333223 0.07976679
Here β1_1β1 is the marginal causal risk difference. A negative value tells us that treatment reduces risk, a positive value indicates increased risk. We found the causal risk difference of 0.051 which indicates catheterization leads to an estimated 5.15 percent absolute increase in the risk of death, under the assumptions of the MSM.
For ease of interpretation, we can also convert this causal risk difference to “number needed to harm” by taking the reciprocal of the causal risk difference.
NNH=RD = 1/0.05151≈19.4
That means if we treated about 19–20 patients with RHC, we would cause one additional death, compared to those not treated by RHC. A causal RD of 0.0515 indicates treating by RHC increases the absolute risk of the death by 5.15 percentage points in the population, which is equivalent to causing 5 extra death events for every 100 treated subjects.
library(ggplot2)
ggplot(mydata, aes(x = w_iptw)) +
geom_histogram(bins = 50) +
labs(x = "IPTW weight",
title = "Distribution of unstabilised IPTW weights") +
theme_minimal()

As you can see from the plot, we have some extreme value of iptw weights. Larger weights in Inverse Probability of Treatment Weighting indicate that specific observations in the study have a low probability of receiving the treatment they actually received given the set of covariates. These extreme weights mean that a few, often unrepresentative, observations are heavily driving the causal effect estimate, which can lead to high variance, increased bias, and unstable results. A small number of extreme weights can dominate the analysis and increase variance. So we need to truncate these weights to reduce this driver bias.
We can truncate extreme weights to reduce instability. We remove the ones having weights of more than 10 and then we redo our causal RR and RD calculations.
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.046 1.405 1.721 1.991 2.280 10.000
ggplot(mydata, aes(x = w_trunc)) +
geom_histogram(bins = 50) +
labs(x = "Truncated IPTW weight",
title = "Distribution of truncated IPTW weights") +
theme_minimal()

glm_rr <- glm(died ~ treatment,
weights = w_trunc,
family = binomial(link = "log"),
data = mydata)
beta_rr <- coef(glm_rr)
SE_rr <- sqrt(diag(vcovHC(glm_rr, type = "HC0")))
causal_rr <- exp(beta_rr["treatment"])
lcl_rr <- exp(beta_rr["treatment"] - 1.96 * SE_rr["treatment"])
ucl_rr <- exp(beta_rr["treatment"] + 1.96 * SE_rr["treatment"])
c(RR = causal_rr, LCL = lcl_rr, UCL = ucl_rr)
RR.treatment LCL.treatment UCL.treatment
1.087130 1.043412 1.132680
Comparing it with our previous causal RR estimate, you can see the one calculated using the truncated weights has more precission.
glm_rd_trunc <- glm(died ~ treatment,
weights = w_trunc,
family = binomial(link = "identity"),
data = mydata)
beta_rd_trunc <- coef(glm_rd_trunc)
SE_rd_trunc <- sqrt(diag(vcovHC(glm_rd_trunc, type = "HC0")))
causal_rd_trunc <- beta_rd_trunc["treatment"]
lcl_rd_trunc <- beta_rd_trunc["treatment"] - 1.96 * SE_rd_trunc["treatment"]
ucl_rd_trunc <- beta_rd_trunc["treatment"] + 1.96 * SE_rd_trunc["treatment"]
c(RD = causal_rd_trunc, LCL = lcl_rd_trunc, UCL = ucl_rd_trunc)
RD.treatment LCL.treatment UCL.treatment
0.05493243 0.02768882 0.08217605
Similar to what we observed in our causal RR estimates using the truncated weights, our causal risk difference estimate after removing observations with extreme iptw gives us a more precise estimate. Therefore, the truncated analysis provides us improved precision and numerical stability.
ipw packageWe can also the calculate the weights using the ipw package.
weightmodel <- ipwpoint(
exposure = treatment,
family = "binomial",
link = "logit",
denominator = ~ age + female + meanbp1 +
ARF + CHF + Cirr + colcan +
Coma + lungcan + MOSF + sepsis,
data = mydata
)
summary(weightmodel$ipw.weights)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.046 1.405 1.721 2.001 2.280 21.606
ipwplot(weights = weightmodel$ipw.weights,
logscale = FALSE,
main = "IPW weights from ipwpoint",
xlim = c(0, 22))

survey with these weightsWe can the run our marginal structural model and obtain similar results using the survey package.
# Survey design for MSM
mydata$w_ipw <- weightmodel$ipw.weights #we attache the weights to our data frame
design_ipw <- svydesign(ids = ~1, weights = ~w_ipw, data = mydata)
# MSM with log link to get causal RR
msm_rr <- svyglm(died ~ treatment,
design = design_ipw,
family = quasibinomial(link = "log"))
# Extract causal RR
RR <- exp(coef(msm_rr)["treatment"])
CI <- exp(confint(msm_rr)["treatment", ])
RR
treatment
1.081764
CI
2.5 % 97.5 %
1.036685 1.128804
design_ipw <- svydesign(ids = ~1, weights = ~w_ipw, data = mydata)
msm_ipw <- svyglm(died ~ treatment, design = design_ipw,
family = quasibinomial(link = "identity"))
coef(msm_ipw)
(Intercept) treatment
0.63046375 0.05154951
confint(msm_ipw)
2.5 % 97.5 %
(Intercept) 0.61401097 0.64691653
treatment 0.02332433 0.07977469
#Let's print ignoring the intercept
coef(msm_ipw)["treatment"]
treatment
0.05154951
confint(msm_ipw)["treatment",]
2.5 % 97.5 %
0.02332433 0.07977469
The coefficient for treatment again represents the marginal causal risk difference. However, we didn’t truncate the weights.
ipwpointweightmodel_trunc <- ipwpoint(
exposure = treatment,
family = "binomial",
link = "logit",
denominator = ~ age + female + meanbp1 +
ARF + CHF + Cirr + colcan +
Coma + lungcan + MOSF + sepsis,
data = mydata,
trunc = 0.01 # trims at 1st and 99th percentile
)
summary(weightmodel_trunc$weights.trunc)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.081 1.405 1.721 1.972 2.280 6.379
ipwplot(weights = weightmodel_trunc$weights.trunc,
logscale = FALSE,
main = "Truncated IPW weights",
xlim = c(0, 22))

We proceed to calculating causal RR and causal odds ratio using the truncated weights
mydata$w_trunc_ipw <- weightmodel_trunc$weights.trunc
design_trunc <- svydesign(ids = ~1, weights = ~w_trunc_ipw, data = mydata) #use the truncated weights
msm_rr <- svyglm(died ~ treatment,
design = design_ipw,
family = quasibinomial(link = "log"))
# Extract causal RR
exp(coef(msm_rr)["treatment"])
treatment
1.081764
2.5 % 97.5 %
1.036685 1.128804
paste(
"Causal RR:", round(exp(coef(msm_rr)["treatment"]),3),
",", "95%CI:","(",
round(exp(confint(msm_rr)["treatment", ])[1],3), ",",
round(exp(confint(msm_rr)["treatment", ])[2],3), ")") #lower and upper
[1] "Causal RR: 1.082 , 95%CI: ( 1.037 , 1.129 )"
#Risk difference
msm_trunc <- svyglm(died ~ treatment, design = design_trunc,
family = quasibinomial(link = "identity"))
coef_rd <- coef(msm_trunc)["treatment"] |>round(4)
confint_rd <- confint(msm_trunc)["treatment",] |>round(4)
paste("Causal RD:", coef_rd, ",", "95%CI:","(",
confint_rd[1], ",",
confint_rd[2], ")") #lower and upper
[1] "Causal RD: 0.0549 , 95%CI: ( 0.0282 , 0.0817 )"
Until now, we have seen how to compute causal risk ration and causal risk difference using two different methods. The outcome variable was death (died) which was binary outcome. We have seen comparable output using the two methods. How about continuous outcome? We follow the same procedure for continuous outcomes Y. The steps are identical:
Fit a propensity score model for treatment.
Construct IPTW weights.
Fit a weighted MSM with a Gaussian outcome model
Our causal effect measure will be marginal causal mean difference.
We can do this using the lalonde dataset. I used this dataset in my previous post on propensity score matching. The outcome: re78 = real earnings in 1978
is continuous.
black hispan white
243 72 299
# Create race indicators
df$black <- as.numeric(df$race == "black")
df$hispan <- as.numeric(df$race == "hispan")
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.009 1.052 1.170 1.905 1.623 40.077
MSMs for continuous outcomes use a Gaussian model with identity link.
This means we estimate:
E[Y(a)]=β0+β1a
or equivalently:
[Y∣do(A=a)]=β0+β1a
design_lalonde <- svydesign(ids = ~1,
weights = ~w_iptw, data = df)
Call:
svyglm(formula = re78 ~ treat, design = design_lalonde, family = gaussian())
Survey design:
svydesign(ids = ~1, weights = ~w_iptw, data = df)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6422.8 365.3 17.584 <2e-16 ***
treat 224.7 910.2 0.247 0.805
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for gaussian family taken to be 50992187)
Number of Fisher Scoring iterations: 2
After standardising treatment assignment using IPTW, we see that participation in the job training program didnot causally increases the 1978 earnings, compared to those who were not participating.
IPTW uses the inverse of the propensity score to construct a pseudo population in which treatment is independent of set of suffcient confounders.
Marginal structural models are fitted using weighted regression to estimate population average causal effects.
For binary outcomes, identity and log links give causal risk difference and causal risk ratio, respectively.
For continuous outcomes, a Gaussian MSM with identity link gives us the margiinal causal mean difference.
Diagnostics such as love plots and weight distributions can be used to assess overlap and the stability of the estimated weights.
Truncation of extreme weights can improve precision at the cost of a small bias, compare to the simple propensity score matching method.
Finally, I would like to mention that this post focuses on a single time point treatment variable, but the ideas extend to longitudinal settings with time varying treatments and covariates.
If you have enjoyed reading this, consider subscribing for upcoming posts so you won’t miss any.