Here is my solution to the Lab assessment for today, based on our Robust Mediation seminar.
Create a dataset based on the observed coefficient estimates from the results on page 13 for the Passion → Locomotion → Grit portion of the model. Set your number of observations equal to 204, which is the same as that reported in the paper, and use set.seed(080203)
for your simulation. Then conduct a sensitivity analysis using the mediation
package using this data, determining the estimated value for rho where the indirect effect would be zero. Based on this analysis, what do you conclude about the threat of omitted variable bias in the paper?
library(tidyverse)
library(lavaan)
population.model <- 'Locomotion ~ .49*Passion
Grit ~ .43*Locomotion
Grit ~ .0*Passion'
# Generate data
set.seed(080203)
my.df <- simulateData(population.model, sample.nobs = 204)
Notice that I specified the parameters for the two parameter estimates (\(a\) and \(b\) that we are interested in), but then constrained the Grit —> Passion path to be equal to zero. This is a judgement call, but as reported in the paper, this path wasn’t hypothesized (though it was freed later).
Any time you generate a dataset based on published results, you want to test your model to see how close you come to what was published. So, we set up a test model.
test.model <- 'Locomotion ~ a * Passion # a path
Grit ~ b * Locomotion # b path
# Specified parameters
ab := a*b'
test.fit <- sem(test.model, data = my.df)
summary(test.fit)
## lavaan (0.5-23.1097) converged normally after 11 iterations
##
## Number of observations 204
##
## Estimator ML
## Minimum Function Test Statistic 3.285
## Degrees of freedom 1
## P-value (Chi-square) 0.070
##
## Parameter Estimates:
##
## Information Expected
## Standard Errors Standard
##
## Regressions:
## Estimate Std.Err z-value P(>|z|)
## Locomotion ~
## Passion (a) 0.469 0.077 6.116 0.000
## Grit ~
## Locomotion (b) 0.468 0.060 7.857 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .Locomotion 1.079 0.107 10.100 0.000
## .Grit 0.926 0.092 10.100 0.000
##
## Defined Parameters:
## Estimate Std.Err z-value P(>|z|)
## ab 0.220 0.046 4.826 0.000
So a couple of things to note.
While we specified the parameter estimates, remember this is a simulation, so we’re not going to get the exact values out of the draw. This is why, to do this correctly, you’d actually not want to use the set.seed
function and let the model change the seed after each draw. Then you do this simulation, say, 1,000 times to get an average, which should be statistically indistinguishable from our specified parameters.
That said, our simulation’s indirect effect is .220, which is only a hundredth of a percent off of the indirect effect specified in the model (.49 x .43 = .21), so, I’m not too concerned about a substantive difference in the simulation results.
Now lets set up our model for the simulation using the mediation
package. For this one, I’m going to use 1,000 simulations.
library(mediation)
# We first set up our two regression equations
mediator.model <- lm(Locomotion ~ Passion, data = my.df)
outcome.model <- lm(Grit ~ Locomotion + Passion, data = my.df)
# Now we specify the full model using the 'mediate' function
mediation.si.fit <- mediate(mediator.model, outcome.model, treat = "Passion",
mediator = "Locomotion", robustSE = TRUE, sims = 1000)
Now lets take a look…
summary(mediation.si.fit)
##
## Causal Mediation Analysis
##
## Quasi-Bayesian Confidence Intervals
##
## Estimate 95% CI Lower 95% CI Upper p-value
## ACME 0.2396 0.1478 0.3503 0.00
## ADE -0.1376 -0.2660 -0.0163 0.02
## Total Effect 0.1020 -0.0341 0.2440 0.15
## Prop. Mediated 2.1304 -13.0782 18.9075 0.15
##
## Sample Size Used: 204
##
##
## Simulations: 1000
So again, we’ve got a higher indirect effect (two one-hundredths) from the one reported in the paper. This is fine for our example, but another reason why you would actually run thousands of these simulations to have more confidence.
Now we need to run some simulations using the medsens
function.
sensitivity.fit <- medsens(mediation.si.fit, rho.by = 0.1,
effect.type = "indirect", sims = 1000)
summary(sensitivity.fit)
##
## Mediation Sensitivity Analysis for Average Causal Mediation Effect
##
## Sensitivity Region
##
## Rho ACME 95% CI Lower 95% CI Upper R^2_M*R^2_Y* R^2_M~R^2_Y~
## [1,] 0.4 0.0532 -0.0114 0.1178 0.16 0.1021
## [2,] 0.5 -0.0075 -0.0700 0.0550 0.25 0.1596
##
## Rho at which ACME = 0: 0.5
## R^2_M*R^2_Y* at which ACME = 0: 0.25
## R^2_M~R^2_Y~ at which ACME = 0: 0.1596
So we end up with a \(\rho\) of 0.5. We would need to see a correlation between the disturbance terms (\(\zeta\)) of about .5 in order to have an simulated omitted variable lead to a different (not necessarily correct!) nomological conclusion from what is reported in the paper.
Is that a lot? Well, it could be, but then again, it could not be. That’s the rub, and why Imai et al. does not provide a table of “critical values of \(\rho\)” that could be abused by researchers.
Again, this is one simulation. If you were going to do this in real life, you’d want to actually run thousands of simulations, collecting \(\rho\) each time, and then building a confidence interval around that average. This is for another lab though!