Here is my solution to the Lab assessment for today, based on our Causal Inference in Mediation seminar.


Deliverable

  • Generate a dataset of 10,000 observations based on a model that you specify with…
    • A continuous dependent variable (y)
    • A continuous predictor variable (x)
    • A continuous predictor (m)
  • Specify a mediation model of x —> m —> y, and you may use my code below. In your specification, include a common cause, z1, between x and m; and another common cause, z2, between m and y.
  • In your specification, also include instrument variables (two each) for x and for m.
  • Evaluate a proper specification to include the z1 and z2 (without instruments).
  • Evaluate a naive mediation model using the Baron and Kenny method in an SEM estimator and compare this result to the proper specification.
  • Evaluate a 2SLS mediation model and compare this with the proper specification and the B&K method, to include the indirect effect (ab path) and confidence intervals. There is no need to include the direct effect \(c'\) path at this point.
  • Estimate your 2SLS model with bootstrapped standard errors with 5,000 replications and report the resulting indirect effect and confidence interval.
  • Estimate your 2SLS model with robust standard errors, report the resulting indirect effect and confidence interval, and compare the result to the bootstrap model.

Generate dataset

library(tidyverse)
library(lavaan)
population.model <- 'x ~ .3*I1 + .3*I2 + .55*z1
                     m ~ .3*I3 + .3*I4 + .25*x + .55*z1 + .55*z2
                     y ~ .2*m + .55*z2'
# Generate data
set.seed(080203)
my.df <- simulateData(population.model, sample.nobs = 10000)

# Test the model
test.model <- 'm ~ a * x  # a path
               y ~ b * m  # b path
               y ~ c_p * x  # c prime path

               # Instrument paths
               x ~ I1 + I2
               m ~ I3 + I4

               # Omitted variable paths
               x ~ z1
               m ~ z1 + z2
               y ~ z2

               # Specified parameters
               ab := a*b
               c := c_p + (a * b)'
test.fit <- sem(test.model, data = my.df)
summary(test.fit)
## lavaan (0.5-23.1097) converged normally after  14 iterations
## 
##   Number of observations                         10000
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic                8.659
##   Degrees of freedom                                10
##   P-value (Chi-square)                           0.565
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   m ~                                                 
##     x          (a)    0.260    0.009   27.954    0.000
##   y ~                                                 
##     m          (b)    0.204    0.009   23.854    0.000
##     x        (c_p)    0.007    0.009    0.765    0.444
##   x ~                                                 
##     I1                0.286    0.010   28.670    0.000
##     I2                0.283    0.010   28.390    0.000
##   m ~                                                 
##     I3                0.287    0.010   28.696    0.000
##     I4                0.290    0.010   29.086    0.000
##   x ~                                                 
##     z1                0.543    0.010   54.119    0.000
##   m ~                                                 
##     z1                0.540    0.011   47.906    0.000
##     z2                0.547    0.010   54.445    0.000
##   y ~                                                 
##     z2                0.541    0.011   48.506    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .m                 0.989    0.014   70.711    0.000
##    .y                 1.006    0.014   70.711    0.000
##    .x                 0.984    0.014   70.711    0.000
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     ab                0.053    0.003   18.145    0.000
##     c                 0.060    0.009    6.859    0.000

Note that the \(c'\) path is essentially zero.


Proper mediation model

mediation.ov.model <- 'm ~ a * x  # a path
                       y ~ b * m  # b path
                       y ~ c_p * x  # c prime path

                       # Correct omitted variable paths
                       x ~ z1
                       m ~ z1 + z2
                       y ~ z2
                       
                       # Specified parameters
                       ab := a*b
                       c := c_p + (a * b)'
mediation.ov.fit <- sem(mediation.ov.model, data = my.df)
summary(mediation.ov.fit)
## lavaan (0.5-23.1097) converged normally after  18 iterations
## 
##   Number of observations                         10000
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic                0.218
##   Degrees of freedom                                 2
##   P-value (Chi-square)                           0.897
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   m ~                                                 
##     x          (a)    0.262    0.010   26.137    0.000
##   y ~                                                 
##     m          (b)    0.204    0.009   23.850    0.000
##     x        (c_p)    0.007    0.009    0.765    0.444
##   x ~                                                 
##     z1                0.546    0.011   50.383    0.000
##   m ~                                                 
##     z1                0.536    0.012   44.077    0.000
##     z2                0.549    0.011   50.669    0.000
##   y ~                                                 
##     z2                0.541    0.011   48.498    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .m                 1.153    0.016   70.711    0.000
##    .y                 1.006    0.014   70.711    0.000
##    .x                 1.146    0.016   70.711    0.000
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     ab                0.053    0.003   17.618    0.000
##     c                 0.060    0.009    6.893    0.000

Note that in this model we’ve omitted the instruments, so that’s going to change the variance in x and m (without the additional predictors). Nonetheless, our model shows no evidence of misspecification (\(\chi\)2 = 0.218; p = .897; df = 2), and the parameter estimates are very close to the model we specified previously.


Naive (B&K) mediation model

mediation.bk.model <- 'm ~ a * x  # a path
                       y ~ b * m  # b path
                       y ~ c_p * x  # c prime path
  
                       # Specified parameters
                       ab := a*b
                       c := c_p + (a * b)'
mediation.bk.fit <- sem(mediation.bk.model, data = my.df)
summary(mediation.bk.fit)
## lavaan (0.5-23.1097) converged normally after  14 iterations
## 
##   Number of observations                         10000
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic                0.000
##   Degrees of freedom                                 0
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   m ~                                                 
##     x          (a)    0.461    0.011   42.800    0.000
##   y ~                                                 
##     m          (b)    0.377    0.009   43.745    0.000
##     x        (c_p)   -0.073    0.010   -7.192    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .m                 1.670    0.024   70.711    0.000
##    .y                 1.242    0.018   70.711    0.000
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     ab                0.174    0.006   30.593    0.000
##     c                 0.101    0.010    9.986    0.000

Here we see a substantially inflated indirect effect (ab = .174; p < .001) compared to the correctly specified model. Further, we observe a spurious direct effect between x and y, leading to a Type I error and an incorrect conclusion of the ‘total effect’ under the B&K framework.


2SLS mediation model

mediation.iv.model <- 'm ~ a * x  # a path
                       y ~ b * m  # b path

                       # Instrument variable paths
                       x ~ I1 + I2
                       m ~ I3 + I4

                       # Specify the error term covariances
                       x ~~ m
                       m ~~ y

                       ab := a*b'
mediation.iv.fit <- sem(mediation.iv.model, data = my.df)
summary(mediation.iv.fit, rsquare = TRUE)
## lavaan (0.5-23.1097) converged normally after  27 iterations
## 
##   Number of observations                         10000
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic                7.089
##   Degrees of freedom                                 7
##   P-value (Chi-square)                           0.420
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   m ~                                                 
##     x          (a)    0.272    0.030    8.979    0.000
##   y ~                                                 
##     m          (b)    0.228    0.016   13.895    0.000
##   x ~                                                 
##     I1                0.286    0.011   25.525    0.000
##     I2                0.287    0.011   25.557    0.000
##   m ~                                                 
##     I3                0.278    0.012   22.824    0.000
##     I4                0.298    0.012   24.595    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##  .m ~~                                                
##    .x                 0.271    0.041    6.599    0.000
##    .y                 0.247    0.028    8.731    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .m                 1.557    0.027   56.709    0.000
##    .y                 1.280    0.020   64.533    0.000
##    .x                 1.273    0.018   70.711    0.000
## 
## R-Square:
##                    Estimate
##     m                 0.212
##     y                 0.144
##     x                 0.114
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     ab                0.062    0.008    7.446    0.000

Our model shows no evidence of misspecification (\(\chi\)2 = 7.089; p = .420; df = 7), so that’s a good thing. We can also see the \(\zeta\) covariances between x and m, and m to y (0.271 and 0.247). That these covariances are statistically significant is exactly what we would expect.

But, I’ve added the R2 output here to show the explained variance in our focal variables. We could split this apart and derive the \(F\) value for x and m, which would give us the joint validity estimation of our instruments. Here, we’ve got some inflation in the indirect path (ab = 0.062), so while it’s close to the ideal (ab = 0.053), it’s still biased. This is a case of weaker instruments not doing a great job for us.

Go back and look at my code for the specified model. I set the loading of the instruments at .3. That’s not all that strong. For an experiment, go back to that code and set the loading higher, say above .5 for each of them, and see what you observe with the indirect effect in the 2SLS model. Then, go back and set the instrument loading to be quit low, say .1, and see what happens.

So I’ll retain this model for my example, but it’s a good lesson that stronger instruments are always best!

Now, construct a confidence interval around the Delta corrected (i.e., Sobel) standard errors for the indirect effect:

ci.ll <- 0.062 - (1.96 * 0.008)
ci.ul <- 0.062 + (1.96 * 0.008)
cat("Confidence interval for indirect effect (Delta s.e): ", ci.ll, ci.ul)
## Confidence interval for indirect effect (Delta s.e):  0.04632 0.07768

Bootstrapped standard errors

mediation.ivboot.fit <- sem(mediation.iv.model, data = my.df, 
                             se = "bootstrap", bootstrap = 5000)
parameterEstimates(mediation.ivboot.fit)
##    lhs op rhs label    est    se      z pvalue ci.lower ci.upper
## 1    m  ~   x     a  0.272 0.031  8.893      0    0.212    0.332
## 2    y  ~   m     b  0.228 0.016 13.898      0    0.194    0.259
## 3    x  ~  I1        0.286 0.011 25.277      0    0.265    0.309
## 4    x  ~  I2        0.287 0.011 25.966      0    0.266    0.309
## 5    m  ~  I3        0.278 0.012 22.738      0    0.254    0.303
## 6    m  ~  I4        0.298 0.012 24.588      0    0.274    0.321
## 7    m ~~   x        0.271 0.042  6.497      0    0.189    0.352
## 8    m ~~   y        0.247 0.028  8.754      0    0.193    0.304
## 9    m ~~   m        1.557 0.028 55.986      0    1.505    1.614
## 10   y ~~   y        1.280 0.020 65.472      0    1.242    1.319
## 11   x ~~   x        1.273 0.018 71.695      0    1.238    1.308
## 12  I1 ~~  I1        0.992 0.000     NA     NA    0.992    0.992
## 13  I1 ~~  I2        0.009 0.000     NA     NA    0.009    0.009
## 14  I1 ~~  I3       -0.009 0.000     NA     NA   -0.009   -0.009
## 15  I1 ~~  I4        0.021 0.000     NA     NA    0.021    0.021
## 16  I2 ~~  I2        0.988 0.000     NA     NA    0.988    0.988
## 17  I2 ~~  I3        0.015 0.000     NA     NA    0.015    0.015
## 18  I2 ~~  I4       -0.006 0.000     NA     NA   -0.006   -0.006
## 19  I3 ~~  I3        0.986 0.000     NA     NA    0.986    0.986
## 20  I3 ~~  I4       -0.006 0.000     NA     NA   -0.006   -0.006
## 21  I4 ~~  I4        0.994 0.000     NA     NA    0.994    0.994
## 22  ab := a*b    ab  0.062 0.009  7.238      0    0.046    0.080

Here we see bootstrapped standard errors of .046 - .080, which is very close the Delta method.


Robust standard errors

mediation.ivrobust.fit <- sem(mediation.iv.model, data = my.df, se = "robust.sem")
parameterEstimates(mediation.ivrobust.fit)
##    lhs op rhs label    est    se      z pvalue ci.lower ci.upper
## 1    m  ~   x     a  0.272 0.032  8.560  0.000    0.210    0.335
## 2    y  ~   m     b  0.228 0.016 13.930  0.000    0.196    0.260
## 3    x  ~  I1        0.286 0.012 23.110  0.000    0.262    0.311
## 4    x  ~  I2        0.287 0.012 23.190  0.000    0.263    0.311
## 5    m  ~  I3        0.278 0.013 21.067  0.000    0.252    0.303
## 6    m  ~  I4        0.298 0.013 22.698  0.000    0.272    0.324
## 7    m ~~   x        0.271 0.043  6.241  0.000    0.186    0.356
## 8    m ~~   y        0.247 0.028  8.799  0.000    0.192    0.302
## 9    m ~~   m        1.557 0.028 55.220  0.000    1.502    1.613
## 10   y ~~   y        1.280 0.020 65.315  0.000    1.241    1.318
## 11   x ~~   x        1.273 0.018 70.326  0.000    1.237    1.308
## 12  I1 ~~  I1        0.992 0.000     NA     NA    0.992    0.992
## 13  I1 ~~  I2        0.009 0.000     NA     NA    0.009    0.009
## 14  I1 ~~  I3       -0.009 0.000     NA     NA   -0.009   -0.009
## 15  I1 ~~  I4        0.021 0.000     NA     NA    0.021    0.021
## 16  I2 ~~  I2        0.988 0.000     NA     NA    0.988    0.988
## 17  I2 ~~  I3        0.015 0.000     NA     NA    0.015    0.015
## 18  I2 ~~  I4       -0.006 0.000     NA     NA   -0.006   -0.006
## 19  I3 ~~  I3        0.986 0.000     NA     NA    0.986    0.986
## 20  I3 ~~  I4       -0.006 0.000     NA     NA   -0.006   -0.006
## 21  I4 ~~  I4        0.994 0.000     NA     NA    0.994    0.994
## 22   m ~1           -0.014 0.013 -1.053  0.292   -0.040    0.012
## 23   y ~1            0.011 0.011  0.931  0.352   -0.012    0.033
## 24   x ~1            0.008 0.012  0.702  0.483   -0.015    0.032
## 25  I1 ~1            0.014 0.000     NA     NA    0.014    0.014
## 26  I2 ~1           -0.009 0.000     NA     NA   -0.009   -0.009
## 27  I3 ~1           -0.002 0.000     NA     NA   -0.002   -0.002
## 28  I4 ~1           -0.003 0.000     NA     NA   -0.003   -0.003
## 29  ab := a*b    ab  0.062 0.009  7.179  0.000    0.045    0.079

Here we see bootstrapped standard errors of .045 - .079, which is very close the Delta and to the bootstrap method.

The key takeaway here is that, particularly in large samples, all three methods produce estimates that are very close. For giggles, you can estimate the same initial model but with N = 150, and observe the differences. Generally though, robust standard errors and bootstrap standard errors are very close.



Brian S. Anderson, Ph.D. | Assistant Professor of Entrepreneurship | andersonbri@umkc.edu
Henry W. Bloch School of Management | University of Missouri - Kansas City
entrepreneurship.bloch.umkc.edu
(c) Brian S. Anderson, 2017