Here is my solution to the Lab assessment for today, based on our Moderation: The Basics seminar.


Deliverable

  • Generate a dataset of 1,000 observations with a continuous dependent variable (y), a continuous predictor variable based on a five point Likert-type scale (x), and a dichotomous moderator (m) based on a regression equation that you specify. Please make sure to mean-center your focal predictor (x).
  • Estimate a moderation model equivalent to: \(y=\alpha+{\beta}{x}+{\beta}{m}+{\beta}{x}{m}+\varepsilon\)
  • Evaluate the effect on y for the coefficient estimate of x when m = 0.
  • Evaluate the change in the effect of x on y when m = 1.
  • Provide a visualization of effect of x on y at m = 0 and m = 1, and discuss any areas where there may be no statistical difference between the two slopes.
  • Provide a visualization of the marginal effect of x on y at m = 0 and m = 1.

Generate Dataset

We’ll be generating predicted values of y from a regression equation, so we’re going to start with generating our x and m variables first.

library(tidyverse)
set.seed(08022003)
my.df <- data_frame(x = round(runif(1000,1,5),0),
                    m = rbinom(1000,1,.5))
# Mean center x to facilitate interpretation
my.df$x <- (my.df$x - mean(my.df$x))
# Lets take a look at the first five rows of data...
head(my.df, 5)
## # A tibble: 5 × 2
##        x     m
##    <dbl> <int>
## 1 -0.956     1
## 2  0.044     0
## 3  0.044     1
## 4 -0.956     1
## 5  0.044     0

Rather than using the old fashioned way above, you could mean center x using this shortcut, which can also standardize the variable with the scale = TRUE option :)

my.df$x <- scale(my.df$x, center = TRUE, scale = FALSE)

We can take a look at our variable’s summary statistics using the psych package…

library(psych)
describe(my.df, fast = TRUE)
##   vars    n mean   sd   min  max range   se
## x    1 1000  0.0 1.18 -1.96 2.04     4 0.04
## m    2 1000  0.5 0.50  0.00 1.00     1 0.02

Our now centered x has a mean of 0 and a standard deviation of 1.18, with a lower bound of -1.96 and an upper bound of 2.04.


Generate Our Y Values

This is the regression equation we’ll be building to generate our predicted values of y: \(y=\alpha+{\beta}{x}+{\beta}{m}+{\beta}{x}{m}+\varepsilon\)

But before we can predict y, we first need to add our random disturbance term, \(\epsilon\). The disturbance term is the amount of variance in y not accounted for by the predictors in our model.

Recall that to render consistent estimates, \(\epsilon\) must have a constant variance, \({\sigma}^{2}\), and an expected mean, \(\mu\), of zero.

mu <- 0
sigma <- .5
my.df$epsilon <- rnorm(1000, mu, sigma)  # Add this to our dataframe

Now lets specify our coefficients…

my.effects <- data_frame(a = 1.2,  # This is our intercept
                         b1 = 0.2,  # This is the effect of x on y
                         b2 = .1,  # This is the effect of m on y
                         b3 = .15)  # This is the interaction effect

Now we’re ready to generate our predicted values for y.

my.df$y <- my.effects$a +
           (my.effects$b1 * my.df$x) +
           (my.effects$b2 * my.df$m) + 
           (my.effects$b3 * my.df$x * my.df$m) + my.df$epsilon

Always a good idea to take a look at the results…

head(my.df, 5)
## # A tibble: 5 × 4
##        x     m     epsilon         y
##    <dbl> <int>       <dbl>     <dbl>
## 1 -0.956     1 -0.82718114 0.1382189
## 2  0.044     0 -0.33852200 0.8702780
## 3  0.044     1  0.00586434 1.3212643
## 4 -0.956     1 -0.16154505 0.8038550
## 5  0.044     0  0.81334282 2.0221428

Estimate Our Model

Now lets estimate our regression model…

interaction.model <- lm(y ~ x*m, data = my.df)
summary(interaction.model)
## 
## Call:
## lm(formula = y ~ x * m, data = my.df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.48903 -0.32969  0.00309  0.34175  1.96743 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.24390    0.02230  55.772  < 2e-16 ***
## x            0.18331    0.01850   9.910  < 2e-16 ***
## m            0.04837    0.03161   1.530    0.126    
## x:m          0.15310    0.02684   5.704 1.54e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4993 on 996 degrees of freedom
## Multiple R-squared:  0.288,  Adjusted R-squared:  0.2859 
## F-statistic: 134.3 on 3 and 996 DF,  p-value: < 2.2e-16

Effect of X on Y

Our task here is to evaluate the effect of x on y when m = 0. This is effectively our baseline effect, given that x has a meaningful zero value and so does m. The coefficient estimate would be the average expected effect (the slope) of x on y when m = 0 and hence the interaction effect is also equal to zero.

In our model, this value is:

x.effect = interaction.model$coefficients["x"]
x.effect
##         x 
## 0.1833084

Change in Effect of X on Y

Recall that in a moderation model, we’re theoretically expecting that the effect of x and y changes as a function of the level of the moderator, m.

The coefficient of (x)(m) is the change in the effect of x when m = 1. Because x has a meaningful zero value, it’s effectively the average expected change in the average expected effect (slope) of x on y.

xm.effect = interaction.model$coefficients["x:m"]
xm.effect
##       x:m 
## 0.1530967

Another way to think about it is that you’ve really got two different equations, one for the effect of x on y when m = 0, and another for the effect of x on y when m = 1.

Lets pull all of our coefficients and then walk through the logic…

interaction.model$coefficients
## (Intercept)           x           m         x:m 
##  1.24389958  0.18330843  0.04837227  0.15309668

We know that x has a meaningful zero value, and so does m, so our intercept is the predicted value of y when x = 0 (the mean in our case) and m = 0. The coefficient estimate for x is the effect (slope) of x on y when m = 0.

So lets look at the predicted value of y when x is at its mean value:

y.m0 <- interaction.model$coefficients["(Intercept)"] +
        (interaction.model$coefficients["x"] * mean(my.df$x)) + 
        (interaction.model$coefficients["m"] * 0) + 
        (interaction.model$coefficients["x:m"] * mean(my.df$x) * 0)
cat("Predicted value of y at mean of x and m = 0:", y.m0)
## Predicted value of y at mean of x and m = 0: 1.2439

Now we’ll get the predicted value of y when x is at its mean and m = 1:

y.m1 <- interaction.model$coefficients["(Intercept)"] +
        (interaction.model$coefficients["x"] * mean(my.df$x)) + 
        (interaction.model$coefficients["m"] * 1) + 
        (interaction.model$coefficients["x:m"] * mean(my.df$x) * 1)
cat("Predicted value of y at mean of x and m = 1:", y.m1)
## Predicted value of y at mean of x and m = 1: 1.292272

The difference between these two values is the coefficient estimate for m, which is the expected shift in the value of y as m moves from 0 to 1 when x = 0 (the mean in our case):

cat("Difference in predicted values:", y.m1 - y.m0)
## Difference in predicted values: 0.04837227

As we’ll talk about in a minute, the average marginal effect (also called the conditional coefficient), which is the average change in the slope of x, given that our moderator is dichotomous, is the coefficient estimate for our interaction term:

cat("Average marginal effect:", interaction.model$coefficients["x:m"])
## Average marginal effect: 0.1530967

Interaction Plots

Now, lets make a plot of the interaction effect using a variety of different packages.

First up is rockchalk

library(rockchalk)
plotSlopes(interaction.model, plotx = "x", modx = "m", 
           modxVals = c(0, 1),
           interval = "confidence",
           plotPoints = FALSE)

Next is visreg

library(visreg)
visreg(interaction.model, "x", by="m", 
       overlay=TRUE, partial=FALSE)

And lastly is ggplot2

library(ggplot2)
interaction.plot <- ggplot(data = my.df, 
                           aes(x = x, y = y, colour=factor(m)))
interaction.plot + stat_smooth(method="lm") + 
                   theme_minimal() + 
                   theme(legend.position = "bottom") + 
                   xlab("x") + 
                   ylab("Predicted y") + 
                   labs(colour="") +
                   ggtitle("My Interaction")


Marginal EFfect

Lastly lets plot our marginal effects with interplot.

library(interplot)
interplot(m = interaction.model, var1 = "x", var2 = "m") +
          geom_hline(yintercept = 0, linetype = "dashed") + 
          theme_minimal() + 
          xlab("m") + 
          ylab("Estimated Effect of x on y") +
          ggtitle("Marginal Effect of x on y by m")

Because our x value is mean centered, when m = 0, the point estimate of the average effect of x on y is the coefficient for x, 0.18330843. When m = 1, we add the coefficient for the average change in the effect of x on y, 0.15309668, which is the coefficient for the interaction term. Together, this gives us a point estimate for the average effect of x on y when m = 1 of 0.33640511, which is what our visualization shows.



Brian S. Anderson, Ph.D. | Assistant Professor of Entrepreneurship | andersonbri@umkc.edu
Henry W. Bloch School of Management | University of Missouri - Kansas City
entrepreneurship.bloch.umkc.edu
(c) Brian S. Anderson, 2017