Here is my solution to the Lab assessment for today, based on our Moderation: The Basics seminar.
We’ll be generating predicted values of y from a regression equation, so we’re going to start with generating our x and m variables first.
library(tidyverse)
set.seed(08022003)
my.df <- data_frame(x = round(runif(1000,1,5),0),
m = rbinom(1000,1,.5))
# Mean center x to facilitate interpretation
my.df$x <- (my.df$x - mean(my.df$x))
# Lets take a look at the first five rows of data...
head(my.df, 5)
## # A tibble: 5 × 2
## x m
## <dbl> <int>
## 1 -0.956 1
## 2 0.044 0
## 3 0.044 1
## 4 -0.956 1
## 5 0.044 0
Rather than using the old fashioned way above, you could mean center x using this shortcut, which can also standardize the variable with the scale = TRUE
option :)
my.df$x <- scale(my.df$x, center = TRUE, scale = FALSE)
We can take a look at our variable’s summary statistics using the psych
package…
library(psych)
describe(my.df, fast = TRUE)
## vars n mean sd min max range se
## x 1 1000 0.0 1.18 -1.96 2.04 4 0.04
## m 2 1000 0.5 0.50 0.00 1.00 1 0.02
Our now centered x has a mean of 0 and a standard deviation of 1.18, with a lower bound of -1.96 and an upper bound of 2.04.
This is the regression equation we’ll be building to generate our predicted values of y: \(y=\alpha+{\beta}{x}+{\beta}{m}+{\beta}{x}{m}+\varepsilon\)
But before we can predict y, we first need to add our random disturbance term, \(\epsilon\). The disturbance term is the amount of variance in y not accounted for by the predictors in our model.
Recall that to render consistent estimates, \(\epsilon\) must have a constant variance, \({\sigma}^{2}\), and an expected mean, \(\mu\), of zero.
mu <- 0
sigma <- .5
my.df$epsilon <- rnorm(1000, mu, sigma) # Add this to our dataframe
Now lets specify our coefficients…
my.effects <- data_frame(a = 1.2, # This is our intercept
b1 = 0.2, # This is the effect of x on y
b2 = .1, # This is the effect of m on y
b3 = .15) # This is the interaction effect
Now we’re ready to generate our predicted values for y.
my.df$y <- my.effects$a +
(my.effects$b1 * my.df$x) +
(my.effects$b2 * my.df$m) +
(my.effects$b3 * my.df$x * my.df$m) + my.df$epsilon
Always a good idea to take a look at the results…
head(my.df, 5)
## # A tibble: 5 × 4
## x m epsilon y
## <dbl> <int> <dbl> <dbl>
## 1 -0.956 1 -0.82718114 0.1382189
## 2 0.044 0 -0.33852200 0.8702780
## 3 0.044 1 0.00586434 1.3212643
## 4 -0.956 1 -0.16154505 0.8038550
## 5 0.044 0 0.81334282 2.0221428
Now lets estimate our regression model…
interaction.model <- lm(y ~ x*m, data = my.df)
summary(interaction.model)
##
## Call:
## lm(formula = y ~ x * m, data = my.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.48903 -0.32969 0.00309 0.34175 1.96743
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.24390 0.02230 55.772 < 2e-16 ***
## x 0.18331 0.01850 9.910 < 2e-16 ***
## m 0.04837 0.03161 1.530 0.126
## x:m 0.15310 0.02684 5.704 1.54e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4993 on 996 degrees of freedom
## Multiple R-squared: 0.288, Adjusted R-squared: 0.2859
## F-statistic: 134.3 on 3 and 996 DF, p-value: < 2.2e-16
Our task here is to evaluate the effect of x on y when m = 0. This is effectively our baseline effect, given that x has a meaningful zero value and so does m. The coefficient estimate would be the average expected effect (the slope) of x on y when m = 0 and hence the interaction effect is also equal to zero.
In our model, this value is:
x.effect = interaction.model$coefficients["x"]
x.effect
## x
## 0.1833084
Recall that in a moderation model, we’re theoretically expecting that the effect of x and y changes as a function of the level of the moderator, m.
The coefficient of (x)(m) is the change in the effect of x when m = 1. Because x has a meaningful zero value, it’s effectively the average expected change in the average expected effect (slope) of x on y.
xm.effect = interaction.model$coefficients["x:m"]
xm.effect
## x:m
## 0.1530967
Another way to think about it is that you’ve really got two different equations, one for the effect of x on y when m = 0, and another for the effect of x on y when m = 1.
Lets pull all of our coefficients and then walk through the logic…
interaction.model$coefficients
## (Intercept) x m x:m
## 1.24389958 0.18330843 0.04837227 0.15309668
We know that x has a meaningful zero value, and so does m, so our intercept is the predicted value of y when x = 0 (the mean in our case) and m = 0. The coefficient estimate for x is the effect (slope) of x on y when m = 0.
So lets look at the predicted value of y when x is at its mean value:
y.m0 <- interaction.model$coefficients["(Intercept)"] +
(interaction.model$coefficients["x"] * mean(my.df$x)) +
(interaction.model$coefficients["m"] * 0) +
(interaction.model$coefficients["x:m"] * mean(my.df$x) * 0)
cat("Predicted value of y at mean of x and m = 0:", y.m0)
## Predicted value of y at mean of x and m = 0: 1.2439
Now we’ll get the predicted value of y when x is at its mean and m = 1:
y.m1 <- interaction.model$coefficients["(Intercept)"] +
(interaction.model$coefficients["x"] * mean(my.df$x)) +
(interaction.model$coefficients["m"] * 1) +
(interaction.model$coefficients["x:m"] * mean(my.df$x) * 1)
cat("Predicted value of y at mean of x and m = 1:", y.m1)
## Predicted value of y at mean of x and m = 1: 1.292272
The difference between these two values is the coefficient estimate for m, which is the expected shift in the value of y as m moves from 0 to 1 when x = 0 (the mean in our case):
cat("Difference in predicted values:", y.m1 - y.m0)
## Difference in predicted values: 0.04837227
As we’ll talk about in a minute, the average marginal effect (also called the conditional coefficient), which is the average change in the slope of x, given that our moderator is dichotomous, is the coefficient estimate for our interaction term:
cat("Average marginal effect:", interaction.model$coefficients["x:m"])
## Average marginal effect: 0.1530967
Now, lets make a plot of the interaction effect using a variety of different packages.
First up is rockchalk
…
library(rockchalk)
plotSlopes(interaction.model, plotx = "x", modx = "m",
modxVals = c(0, 1),
interval = "confidence",
plotPoints = FALSE)
Next is visreg
…
library(visreg)
visreg(interaction.model, "x", by="m",
overlay=TRUE, partial=FALSE)
And lastly is ggplot2
…
library(ggplot2)
interaction.plot <- ggplot(data = my.df,
aes(x = x, y = y, colour=factor(m)))
interaction.plot + stat_smooth(method="lm") +
theme_minimal() +
theme(legend.position = "bottom") +
xlab("x") +
ylab("Predicted y") +
labs(colour="") +
ggtitle("My Interaction")
Lastly lets plot our marginal effects with interplot
.
library(interplot)
interplot(m = interaction.model, var1 = "x", var2 = "m") +
geom_hline(yintercept = 0, linetype = "dashed") +
theme_minimal() +
xlab("m") +
ylab("Estimated Effect of x on y") +
ggtitle("Marginal Effect of x on y by m")
Because our x value is mean centered, when m = 0, the point estimate of the average effect of x on y is the coefficient for x, 0.18330843. When m = 1, we add the coefficient for the average change in the effect of x on y, 0.15309668, which is the coefficient for the interaction term. Together, this gives us a point estimate for the average effect of x on y when m = 1 of 0.33640511, which is what our visualization shows.