Here is my solution to the Lab assessment for today, based on our Causal Inference in Mediation seminar.
library(tidyverse)
library(lavaan)
population.model <- 'x ~ .3*I1 + .3*I2 + .55*z1
m ~ .3*I3 + .3*I4 + .25*x + .55*z1 + .55*z2
y ~ .2*m + .55*z2'
# Generate data
set.seed(080203)
my.df <- simulateData(population.model, sample.nobs = 10000)
# Test the model
test.model <- 'm ~ a * x # a path
y ~ b * m # b path
y ~ c_p * x # c prime path
# Instrument paths
x ~ I1 + I2
m ~ I3 + I4
# Omitted variable paths
x ~ z1
m ~ z1 + z2
y ~ z2
# Specified parameters
ab := a*b
c := c_p + (a * b)'
test.fit <- sem(test.model, data = my.df)
summary(test.fit)
## lavaan (0.5-23.1097) converged normally after 14 iterations
##
## Number of observations 10000
##
## Estimator ML
## Minimum Function Test Statistic 8.659
## Degrees of freedom 10
## P-value (Chi-square) 0.565
##
## Parameter Estimates:
##
## Information Expected
## Standard Errors Standard
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## m ~
## x (a) 0.260 0.009 27.954 0.000
## y ~
## m (b) 0.204 0.009 23.854 0.000
## x (c_p) 0.007 0.009 0.765 0.444
## x ~
## I1 0.286 0.010 28.670 0.000
## I2 0.283 0.010 28.390 0.000
## m ~
## I3 0.287 0.010 28.696 0.000
## I4 0.290 0.010 29.086 0.000
## x ~
## z1 0.543 0.010 54.119 0.000
## m ~
## z1 0.540 0.011 47.906 0.000
## z2 0.547 0.010 54.445 0.000
## y ~
## z2 0.541 0.011 48.506 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .m 0.989 0.014 70.711 0.000
## .y 1.006 0.014 70.711 0.000
## .x 0.984 0.014 70.711 0.000
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|)
## ab 0.053 0.003 18.145 0.000
## c 0.060 0.009 6.859 0.000
Note that the \(c'\) path is essentially zero.
mediation.ov.model <- 'm ~ a * x # a path
y ~ b * m # b path
y ~ c_p * x # c prime path
# Correct omitted variable paths
x ~ z1
m ~ z1 + z2
y ~ z2
# Specified parameters
ab := a*b
c := c_p + (a * b)'
mediation.ov.fit <- sem(mediation.ov.model, data = my.df)
summary(mediation.ov.fit)
## lavaan (0.5-23.1097) converged normally after 18 iterations
##
## Number of observations 10000
##
## Estimator ML
## Minimum Function Test Statistic 0.218
## Degrees of freedom 2
## P-value (Chi-square) 0.897
##
## Parameter Estimates:
##
## Information Expected
## Standard Errors Standard
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## m ~
## x (a) 0.262 0.010 26.137 0.000
## y ~
## m (b) 0.204 0.009 23.850 0.000
## x (c_p) 0.007 0.009 0.765 0.444
## x ~
## z1 0.546 0.011 50.383 0.000
## m ~
## z1 0.536 0.012 44.077 0.000
## z2 0.549 0.011 50.669 0.000
## y ~
## z2 0.541 0.011 48.498 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .m 1.153 0.016 70.711 0.000
## .y 1.006 0.014 70.711 0.000
## .x 1.146 0.016 70.711 0.000
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|)
## ab 0.053 0.003 17.618 0.000
## c 0.060 0.009 6.893 0.000
Note that in this model we’ve omitted the instruments, so that’s going to change the variance in x and m (without the additional predictors). Nonetheless, our model shows no evidence of misspecification (\(\chi\)2 = 0.218; p = .897; df = 2), and the parameter estimates are very close to the model we specified previously.
mediation.bk.model <- 'm ~ a * x # a path
y ~ b * m # b path
y ~ c_p * x # c prime path
# Specified parameters
ab := a*b
c := c_p + (a * b)'
mediation.bk.fit <- sem(mediation.bk.model, data = my.df)
summary(mediation.bk.fit)
## lavaan (0.5-23.1097) converged normally after 14 iterations
##
## Number of observations 10000
##
## Estimator ML
## Minimum Function Test Statistic 0.000
## Degrees of freedom 0
##
## Parameter Estimates:
##
## Information Expected
## Standard Errors Standard
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## m ~
## x (a) 0.461 0.011 42.800 0.000
## y ~
## m (b) 0.377 0.009 43.745 0.000
## x (c_p) -0.073 0.010 -7.192 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .m 1.670 0.024 70.711 0.000
## .y 1.242 0.018 70.711 0.000
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|)
## ab 0.174 0.006 30.593 0.000
## c 0.101 0.010 9.986 0.000
Here we see a substantially inflated indirect effect (ab = .174; p < .001) compared to the correctly specified model. Further, we observe a spurious direct effect between x and y, leading to a Type I error and an incorrect conclusion of the ‘total effect’ under the B&K framework.
mediation.iv.model <- 'm ~ a * x # a path
y ~ b * m # b path
# Instrument variable paths
x ~ I1 + I2
m ~ I3 + I4
# Specify the error term covariances
x ~~ m
m ~~ y
ab := a*b'
mediation.iv.fit <- sem(mediation.iv.model, data = my.df)
summary(mediation.iv.fit, rsquare = TRUE)
## lavaan (0.5-23.1097) converged normally after 27 iterations
##
## Number of observations 10000
##
## Estimator ML
## Minimum Function Test Statistic 7.089
## Degrees of freedom 7
## P-value (Chi-square) 0.420
##
## Parameter Estimates:
##
## Information Expected
## Standard Errors Standard
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## m ~
## x (a) 0.272 0.030 8.979 0.000
## y ~
## m (b) 0.228 0.016 13.895 0.000
## x ~
## I1 0.286 0.011 25.525 0.000
## I2 0.287 0.011 25.557 0.000
## m ~
## I3 0.278 0.012 22.824 0.000
## I4 0.298 0.012 24.595 0.000
##
## Covariances:
## Estimate Std.Err z-value P(>|z|)
## .m ~~
## .x 0.271 0.041 6.599 0.000
## .y 0.247 0.028 8.731 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .m 1.557 0.027 56.709 0.000
## .y 1.280 0.020 64.533 0.000
## .x 1.273 0.018 70.711 0.000
##
## R-Square:
## Estimate
## m 0.212
## y 0.144
## x 0.114
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|)
## ab 0.062 0.008 7.446 0.000
Our model shows no evidence of misspecification (\(\chi\)2 = 7.089; p = .420; df = 7), so that’s a good thing. We can also see the \(\zeta\) covariances between x and m, and m to y (0.271 and 0.247). That these covariances are statistically significant is exactly what we would expect.
But, I’ve added the R2 output here to show the explained variance in our focal variables. We could split this apart and derive the \(F\) value for x and m, which would give us the joint validity estimation of our instruments. Here, we’ve got some inflation in the indirect path (ab = 0.062), so while it’s close to the ideal (ab = 0.053), it’s still biased. This is a case of weaker instruments not doing a great job for us.
Go back and look at my code for the specified model. I set the loading of the instruments at .3. That’s not all that strong. For an experiment, go back to that code and set the loading higher, say above .5 for each of them, and see what you observe with the indirect effect in the 2SLS model. Then, go back and set the instrument loading to be quit low, say .1, and see what happens.
So I’ll retain this model for my example, but it’s a good lesson that stronger instruments are always best!
Now, construct a confidence interval around the Delta corrected (i.e., Sobel) standard errors for the indirect effect:
ci.ll <- 0.062 - (1.96 * 0.008)
ci.ul <- 0.062 + (1.96 * 0.008)
cat("Confidence interval for indirect effect (Delta s.e): ", ci.ll, ci.ul)
## Confidence interval for indirect effect (Delta s.e): 0.04632 0.07768
mediation.ivboot.fit <- sem(mediation.iv.model, data = my.df,
se = "bootstrap", bootstrap = 5000)
parameterEstimates(mediation.ivboot.fit)
## lhs op rhs label est se z pvalue ci.lower ci.upper
## 1 m ~ x a 0.272 0.031 8.893 0 0.212 0.332
## 2 y ~ m b 0.228 0.016 13.898 0 0.194 0.259
## 3 x ~ I1 0.286 0.011 25.277 0 0.265 0.309
## 4 x ~ I2 0.287 0.011 25.966 0 0.266 0.309
## 5 m ~ I3 0.278 0.012 22.738 0 0.254 0.303
## 6 m ~ I4 0.298 0.012 24.588 0 0.274 0.321
## 7 m ~~ x 0.271 0.042 6.497 0 0.189 0.352
## 8 m ~~ y 0.247 0.028 8.754 0 0.193 0.304
## 9 m ~~ m 1.557 0.028 55.986 0 1.505 1.614
## 10 y ~~ y 1.280 0.020 65.472 0 1.242 1.319
## 11 x ~~ x 1.273 0.018 71.695 0 1.238 1.308
## 12 I1 ~~ I1 0.992 0.000 NA NA 0.992 0.992
## 13 I1 ~~ I2 0.009 0.000 NA NA 0.009 0.009
## 14 I1 ~~ I3 -0.009 0.000 NA NA -0.009 -0.009
## 15 I1 ~~ I4 0.021 0.000 NA NA 0.021 0.021
## 16 I2 ~~ I2 0.988 0.000 NA NA 0.988 0.988
## 17 I2 ~~ I3 0.015 0.000 NA NA 0.015 0.015
## 18 I2 ~~ I4 -0.006 0.000 NA NA -0.006 -0.006
## 19 I3 ~~ I3 0.986 0.000 NA NA 0.986 0.986
## 20 I3 ~~ I4 -0.006 0.000 NA NA -0.006 -0.006
## 21 I4 ~~ I4 0.994 0.000 NA NA 0.994 0.994
## 22 ab := a*b ab 0.062 0.009 7.238 0 0.046 0.080
Here we see bootstrapped standard errors of .046 - .080, which is very close the Delta method.
mediation.ivrobust.fit <- sem(mediation.iv.model, data = my.df, se = "robust.sem")
parameterEstimates(mediation.ivrobust.fit)
## lhs op rhs label est se z pvalue ci.lower ci.upper
## 1 m ~ x a 0.272 0.032 8.560 0.000 0.210 0.335
## 2 y ~ m b 0.228 0.016 13.930 0.000 0.196 0.260
## 3 x ~ I1 0.286 0.012 23.110 0.000 0.262 0.311
## 4 x ~ I2 0.287 0.012 23.190 0.000 0.263 0.311
## 5 m ~ I3 0.278 0.013 21.067 0.000 0.252 0.303
## 6 m ~ I4 0.298 0.013 22.698 0.000 0.272 0.324
## 7 m ~~ x 0.271 0.043 6.241 0.000 0.186 0.356
## 8 m ~~ y 0.247 0.028 8.799 0.000 0.192 0.302
## 9 m ~~ m 1.557 0.028 55.220 0.000 1.502 1.613
## 10 y ~~ y 1.280 0.020 65.315 0.000 1.241 1.318
## 11 x ~~ x 1.273 0.018 70.326 0.000 1.237 1.308
## 12 I1 ~~ I1 0.992 0.000 NA NA 0.992 0.992
## 13 I1 ~~ I2 0.009 0.000 NA NA 0.009 0.009
## 14 I1 ~~ I3 -0.009 0.000 NA NA -0.009 -0.009
## 15 I1 ~~ I4 0.021 0.000 NA NA 0.021 0.021
## 16 I2 ~~ I2 0.988 0.000 NA NA 0.988 0.988
## 17 I2 ~~ I3 0.015 0.000 NA NA 0.015 0.015
## 18 I2 ~~ I4 -0.006 0.000 NA NA -0.006 -0.006
## 19 I3 ~~ I3 0.986 0.000 NA NA 0.986 0.986
## 20 I3 ~~ I4 -0.006 0.000 NA NA -0.006 -0.006
## 21 I4 ~~ I4 0.994 0.000 NA NA 0.994 0.994
## 22 m ~1 -0.014 0.013 -1.053 0.292 -0.040 0.012
## 23 y ~1 0.011 0.011 0.931 0.352 -0.012 0.033
## 24 x ~1 0.008 0.012 0.702 0.483 -0.015 0.032
## 25 I1 ~1 0.014 0.000 NA NA 0.014 0.014
## 26 I2 ~1 -0.009 0.000 NA NA -0.009 -0.009
## 27 I3 ~1 -0.002 0.000 NA NA -0.002 -0.002
## 28 I4 ~1 -0.003 0.000 NA NA -0.003 -0.003
## 29 ab := a*b ab 0.062 0.009 7.179 0.000 0.045 0.079
Here we see bootstrapped standard errors of .045 - .079, which is very close the Delta and to the bootstrap method.
The key takeaway here is that, particularly in large samples, all three methods produce estimates that are very close. For giggles, you can estimate the same initial model but with N = 150, and observe the differences. Generally though, robust standard errors and bootstrap standard errors are very close.