Moderation: Continuous moderators

ENT5587B - Research Design & Theory Testing II

Brian S. Anderson, Ph.D.
Assistant Professor
Department of Global Entrepreneurship & Innovation
andersonbri@umkc.edu


RIEI Logo
© 2017 Brian S. Anderson

  • How goes it?
  • IRB Applications
  • Paper progress – Due Monday 20th at Noon
  • Interaction duplicity
  • Centering and standardizing
  • Plotting continuous by continuous interactions
  • Marginal effects
  • Lab 16 Feb: Moderation paper critique

\(y=\alpha+{\beta}{x}+{\beta}{m}+{\beta}{x}{m}+\varepsilon\)

Moderation

We’re going to dig deeper into moderation today, and particularly with moderators that are continuous variables.

First though, a little exercise.

Lets start by getting some data…

library(tidyverse)
my.ds <- read_csv("http://a.web.umkc.edu/andersonbri/ENT5587.csv")
my.df <- as.data.frame(my.ds) %>%
         select(RiskTaking, Inn = Innovativeness, 
                RND = RNDIntensity) %>%
         na.omit()

First up, estimate the following model.

\(RiskTaking=\alpha+{\beta}{Innovativeness}+{\beta}{RNDIntensity...}+\) \({...\beta}{Innovativeness*}{RNDIntensity}+\varepsilon\)

innovativeness.model <- lm(RiskTaking ~ Inn*RND, data = my.df)
summary(innovativeness.model)
## 
## Call:
## lm(formula = RiskTaking ~ Inn * RND, data = my.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.5355 -0.5570  0.1050  0.6536  2.3631 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.43164    0.38951   3.675 0.000367 ***
## Inn          0.63402    0.10248   6.187 1.05e-08 ***
## RND          0.17263    0.07444   2.319 0.022225 *  
## Inn:RND     -0.03807    0.01646  -2.313 0.022548 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.971 on 111 degrees of freedom
## Multiple R-squared:  0.3661, Adjusted R-squared:  0.349 
## F-statistic: 21.37 on 3 and 111 DF,  p-value: 5.349e-11

So there is a significant interaction effect between Innovativeness and R&D Intensity on Risk Taking. Note that both Innovativeness and R&D Intensity are continuous variables.

Take ten minutes and write a hypothesis (1 paragraph) for why the effect of Innovativeness on Risk Taking changes as a function of a change in the level of R&D Intensity.

Next up, estimate this model…

\(RiskTaking=\alpha+{\beta}{RNDIntensity}+{\beta}{Innovativeness...}+\) \({...\beta}{RNDIntensity*}{Innovativeness}+\varepsilon\)

rnd.model <- lm(RiskTaking ~ RND*Inn, data = my.df)
summary(rnd.model)
## 
## Call:
## lm(formula = RiskTaking ~ RND * Inn, data = my.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.5355 -0.5570  0.1050  0.6536  2.3631 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.43164    0.38951   3.675 0.000367 ***
## RND          0.17263    0.07444   2.319 0.022225 *  
## Inn          0.63402    0.10248   6.187 1.05e-08 ***
## RND:Inn     -0.03807    0.01646  -2.313 0.022548 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.971 on 111 degrees of freedom
## Multiple R-squared:  0.3661, Adjusted R-squared:  0.349 
## F-statistic: 21.37 on 3 and 111 DF,  p-value: 5.349e-11

Hmmm…so now we have a significant interaction between R&D Intensity and Innovativeness.

Take ten minutes and write a hypothesis (1 paragraph) for why the effect of R&D Intensity on Risk Taking changes as a function of a change in the level of Innovativeness.

So just taking a step back, which of the two seems the more plausible hypothesis?

Why is the theoretical distinction between x and m important, despite their mathematical equivalency in the model?

Bonus points…anybody see the opportunity to engage in questionable research practices with this feature of moderation models?

So what’s the moral of our story?

Next up, lets talk about centering and standardizing.

We’re going to stick with this model from now on…

\(RiskTaking=\alpha+{\beta}{Innovativeness}+{\beta}{RNDIntensity...}+\) \({...\beta}{Innovativeness*}{RNDIntensity}+\varepsilon\)

And lets create mean centered versions of Innovativeness and R&D Intensity…

my.df$Inn.center <- (my.df$Inn - mean(my.df$Inn))
my.df$RND.center <- (my.df$RND - mean(my.df$RND))

You read papers today on the myth that just won’t die that centering deals with multicollinearity.

I’m not going to rehash these arguments, because I completely agree with them (and you should as well—it’s just basic math). I do though want to quickly show an illustration…

First up, an uncentered variable model.

uncentered.model <- lm(RiskTaking ~ Inn*RND, data = my.df)
library(visreg)
visreg(uncentered.model, "Inn", by="RND", 
       overlay=TRUE, partial=FALSE)

Now, a centered variable model.

centered.model <- lm(RiskTaking ~ Inn.center*RND.center, data = my.df)
visreg(centered.model, "Inn.center", by="RND.center", 
       overlay=TRUE, partial=FALSE)

No, your eyes aren’t playing tricks on you. What difference do you see between these two plots?

The best discussion in my opinion of uncentered and centered predictors is in Chapter 7 of Cohen et al. (2003).

It is a bit technical, but well worth your time.

Feel free to borrow my copy.

Lets walk through what’s going on under the hood of these two models.

library(stargazer)
stargazer(uncentered.model, centered.model, type = "html",
          title = "Model comparisons",
          dep.var.labels.include = FALSE,
          dep.var.caption = "",
          column.labels = c("Uncentered", "Centered"),
          model.numbers = FALSE,
          single.row = TRUE,
          ci = FALSE,
          omit.stat = c("ser"),
          star.cutoffs = c(0.05, 0.01, 0.001),
          star.char = c("*", "**", "***"),
          notes = "† p < .05; ** p < .01; *** p < .001",
          notes.align = c("r"),
          notes.label = "",
          notes.append = FALSE)

Model comparisons
Uncentered Centered
Inn 0.634*** (0.102)
RND 0.173* (0.074)
Inn:RND -0.038* (0.016)
Inn.center 0.445*** (0.067)
RND.center 0.023 (0.027)
Inn.center:RND.center -0.038* (0.016)
Constant 1.432*** (0.390) 4.042*** (0.098)
Observations 115 115
R2 0.366 0.366
Adjusted R2 0.349 0.349
F Statistic (df = 3; 111) 21.368*** 21.368***
† p < .05; ** p < .01; *** p < .001

Pay very careful attention to the coefficient estimate for the interaction xm term. What do you notice?

The short answer is that centering a variable has nothing to do with addressing multicollinearity in an interaction model. Whether you center or don’t center, the highest order coeffecient in the model, xm, does not change. The estimates of the lower order terms do change because we’ve shifted the scale of measurement, but that has nothing to do with collinearity.

We’ll talk about interpreting the coefficients in a minute, but as you have almost certainly read in papers (including mine, tragically), it is common to calculate variance inflation factors in a regression model.

Here’s the uncentered regression model…

library(car)
vif(uncentered.model)
##       Inn       RND   Inn:RND 
##  2.830703  9.748464 14.233548

And the centered…

vif(centered.model)
##            Inn.center            RND.center Inn.center:RND.center 
##              1.206177              1.243624              1.035093

Based on our rule of thumb, a VIF greater than 2.0 suggests that multicollinearity—significant correlations among the predictors—may be inflating the variance of our estimated coefficients in our uncentered model. On the surface, this seems to be a problem.

Remember though our formula for the VIF of the ith coefficient…

\({VIF}_{i}=\frac{1}{1-{R}_{i}^{2}}\)

You calculate the VIF by regressing each predictor on all of the other predictors. The logic is that different variables may inadvertently be capturing similar conceptual domains, rendering inconsistent coefficient estimates because of the high correlations among the predictors.

In an interaction model, the xm term is the product of two variables, x and m, and so not surprisingly, xm correlates strongly with its constituent elements. But this is essential collinearity—it has to be there and will always be there. Fortunately though, linear models are very robust to essential collinearity. It doesn’t bother the estimates at all, it doesn’t impact the estimates of the lower order terms, it doesn’t influence the t statistic or p-value, nor does it impact the R2 of the overall model.

Bottom line, feel free to ignore.

Nonessential collinearity due to measurement error or random error is a different story, and centering does help address these issues, so it’s among the many reasons why you should center.

Lets revisit our coefficient estimates…

Model comparisons
Uncentered Centered
Inn 0.634*** (0.102)
RND 0.173* (0.074)
Inn:RND -0.038* (0.016)
Inn.center 0.445*** (0.067)
RND.center 0.023 (0.027)
Inn.center:RND.center -0.038* (0.016)
Constant 1.432*** (0.390) 4.042*** (0.098)

In both models, the coefficient estimate for the xm term is the average change in the effect of x on y across all observed values of m. We’ll come back to this in a minute.

Also in both models..

  • The coefficient estimate for x is the estimated effect of x on y when m = 0.
  • The coefficient estimate for m is the estimated effect of m on y when x = 0.

Anybody see the issue with this?

Hint…think about how we interpret an intercept.

It’s always appropriate to interpret the lower order coefficients in an interaction model, it’s just not all that easy.

If there is no meaningful value of zero in either x or m, then the coefficient estimate is the effect of x (or m) on y in a hypothetical world that m = 0 (or x = 0) exists.

So absent this context, many scholars simply chose to ignore them (which they shouldn’t!).

So the simple solution to improve interpretability is to give the data a meaningful zero value. We do this by shifting the scale of measurement such that mean of the vector of variables is zero. We don’t do anything to the variance of the variable, just move it over by subtracting each observation from the mean of the variable…

# We'll use our Innovativeness measure
centering.plot <- ggplot(my.df, aes(Inn)) + 
                geom_density((aes(colour = "Uncentered"))) +
                geom_density(data=my.df, (aes(Inn.center, colour = "Centered"))) +
                scale_colour_manual("", values = c("darkblue", "firebrick")) +
                labs(title = "Centered Versus Uncentered Innovativeness",
                     x = "Scale", 
                     y = "Distribution Density")
centering.plot + theme_minimal() + 
                 xlim(-10, 10)

It’s the same variance, we just moved the scale to the left.

So with our variables centered, it’s easier to make sense of our variables…

Beta1 <- centered.model$coefficients["Inn.center"]
Beta2 <- centered.model$coefficients["RND.center"]
Beta3 <- centered.model$coefficients["Inn.center:RND.center"]

Predicted average effect of x on y when m = 0: 0.4449012

Predicted average effect of m on y when x = 0: 0.02266476

Predicted average change in the effect of x on y as m increases -0.03807064

If only it was that simple…

The presence of the interaction term means that the effect of x (or of m for that matter) changes as a function of the level of m. In a continuous by continuous interaction however, m takes on, well, lots of different levels.

In a multiple regression model, x and m are additive. But in an interaction model with a continuous moderator, xm increases curvilinearly as x and m increase linearly (we’ll see a plot later that helps make sense of this).

The net result is that the coefficient estimate for xm is simply the average of those curvilinear changes, which makes the value of the coefficient estimate itself not all that useful (unlike with a dichotomous moderator).

What’s more, the way that we commonly view continuous by continuous interactions really doesn’t tell much of the story.

In fact, it can be downright misleading.

Lets take a look again at the most common way to visualize a continuous by continuous interaction. Usually, researchers plot the effect of x on y at +/- 1 standard deviation (and sometimes at the mean) of x and m, and cite Aiken and West (1991) or Cohen et al. (2003) as justification.

We can do that with visreg by setting two breaks at -4 and 4, which roughly corresponds to +/- 1 standard deviation in our data.

visreg(centered.model, "Inn.center", by="RND.center", 
       overlay=TRUE, partial=FALSE, breaks=c(-4, 4))

Removing the slope of the x –> y effect when m = 0 (the mean) does help clarify the picture. While not a perfect proxy, the areas where the confidence intervals substantially overlap indicates areas where the two slopes are not significantly different.

Also common with this type of analysis is a point comparison. We’re going to test whether the slopes are significantly different from each other at our specified values of m.

Personally, I don’t think this has a lot of value, but it’s common and doesn’t hurt. You can do it by hand with a simple t-test comparison, or pequod is one of many packages that will do it for you.

library(pequod)
point.model <- lmres(RiskTaking ~ Inn.center*RND.center, data=my.df)
slope.compare <- simpleSlope(point.model,pred="Inn.center", mod1="RND.center")
summary(slope.compare)
## 
## ** Estimated points of RiskTaking  **
## 
##                         Low Inn.center (-1 SD) High Inn.center (+1 SD)
## Low RND.center (-1 SD)                  3.0742                  4.8363
## High RND.center (+1 SD)                 3.6807                  4.5756
## 
## 
## 
## ** Simple Slopes analysis ( df= 111 ) **
## 
##                         simple slope standard error t-value p.value    
## Low RND.center (-1 SD)        0.5901         0.0889    6.63  <2e-16 ***
## High RND.center (+1 SD)       0.2997         0.0944    3.17   0.002 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## 
## ** Bauer & Curran 95% CI **
## 
##            lower CI upper CI
## RND.center   5.6344   78.451

Point comparisons though just give you a narrow snapshot of the moderation effect.

The statistically significant coefficient for the xm term just says that on average the slope of x on y changes as a function of the average level of m (m = 0). But in a continuous by continuous interaction, m constantly changes, which means so does the effect of x on y. Depending on the strength of the interaction effect, there are likely areas in which the myriad of potential slopes are not significantly different.

This is what we call the region of significance.

We are really interested in evaluating the marginal effects of m on the relationship between x and y.

The ‘correct’ approach is to actually calculate, and evaluate, the individual slopes across all observed values of m. You can do this by hand, or there are several ways to do this in R.

You can produce a table of outcomes doing this, but I think it’s easier to visualize marginal effects. So, we’re going to use interplot again.

library(interplot)
interplot(m = centered.model, var1 = "Inn.center", var2 = "RND.center") +
          geom_hline(yintercept = 0, linetype = "dashed") + 
          theme_minimal() + 
          xlab("R&D Intensity") + 
          ylab("Estimated Effect of Innovativeness\nOn Risk Taking") +
          ggtitle("Marginal Effect of Innovativeness\nOn Risk Taking by R&D Intensity")

This is straightforward to interpret with practice. The area of the plot where the confidence interval contains zero are the values of R&D Intensity where the effect of Innovativeness and Risk Taking does NOT vary as a function of the level of R&D Intensity.

To put it another way, there is NO interaction effect between Innovativeness and R&D Intensity on Risk Taking when R&D Intensity is roughly greater than five (on a centered scale).

That the line slopes downward corresponds to our simple slopes plot , where we observed the strongest effect of Innovativeness on Risk Taking for firms that were below the mean value of R&D Intensity. The strength of the effect of Innovativeness on Risk Taking gets smaller as R&D Intensity gets bigger.

Ok, that’s a lot to take in.

But, we’re not done.

As we discussed, the common way to visualize a continuous by continuous interaction is to effectively dichotomize the continuous moderator, which often leads to interpretation problems. It’s even possible to have a significant interaction effect at a particular value of m even when the average effect (the xm term) is not significant in the overall model.

So is there another way? Why yes, yes there is.

Lets walk through two of them, and we’re going to use the visreg package here.

First up is a contour plot…

visreg2d(centered.model, "Inn.center", "RND.center", plot.type = "image",
         xlab = "Innovativeness", ylab = "R&D Intensity")

Next is a perspective plot…

visreg2d(centered.model, "Inn.center", "RND.center", plot.type = "persp",
         xlab = "Innovativeness", ylab = "R&D Intensity", zlab = "Risk Taking",
         col = heat.colors(1, alpha = 0.7))

I still have a hard time making sense of these, but I’m also incorporating them into my own research, so keep at it!

Last topic—standardization.

Standardized variable models function in much the same way as mean centered models, in that you have a meaningful zero value for x and m. But, because you normalized the scale by its variance, the values are expressed in standard deviation units. Note that this is not standardized coeffecient estimates, but standardized (z score) variables.

Lets create standardized values (mean of 0, s.d. = 1) for Innovativeness and Risk Taking.

my.df$Inn.stand <- as.vector(scale(my.df$Inn, center = TRUE, scale = TRUE))
my.df$RND.stand <- as.vector(scale(my.df$RND, center = TRUE, scale = TRUE))

Lets take a look at our final dataset…

library(psych)
describe(my.df, fast = TRUE)
##            vars   n mean   sd   min   max range   se
## RiskTaking    1 115 3.95 1.20  1.00  6.33  5.33 0.11
## Inn           2 115 3.94 1.49  1.00  7.00  6.00 0.14
## RND           3 115 4.97 3.81  0.00 17.28 17.28 0.36
## Inn.center    4 115 0.00 1.49 -2.94  3.06  6.00 0.14
## RND.center    5 115 0.00 3.81 -4.97 12.31 17.28 0.36
## Inn.stand     6 115 0.00 1.00 -1.97  2.05  4.02 0.09
## RND.stand     7 115 0.00 1.00 -1.30  3.23  4.53 0.09

Now lets do a standardized variable model.

standardized.model <- lm(RiskTaking ~ Inn.stand*RND.stand, data = my.df)
summary(standardized.model)
## 
## Call:
## lm(formula = RiskTaking ~ Inn.stand * RND.stand, data = my.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.5355 -0.5570  0.1050  0.6536  2.3631 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          4.04170    0.09822  41.148  < 2e-16 ***
## Inn.stand            0.66426    0.09988   6.651 1.14e-09 ***
## RND.stand            0.08645    0.10141   0.852   0.3958    
## Inn.stand:RND.stand -0.21681    0.09372  -2.313   0.0225 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.971 on 111 degrees of freedom
## Multiple R-squared:  0.3661, Adjusted R-squared:  0.349 
## F-statistic: 21.37 on 3 and 111 DF,  p-value: 5.349e-11

Look familiar?

Even though it’s common to see continuous by continuous interaction plots of +/- 1 standard deviation, often these plots were made using centered, not standardized values. The interpretation of the general effect would be same, but the scale would not be.

For the same reason that I think standardizing coefficient estimates in regression makes it difficult to interpret effect sizes, I prefer centering variables in interaction models to standardizing. The choice is yours, but as more journals are asking for substantive conversations of effect sizes and confidence intervals, the less you have to convert coefficient estimates to ‘actual’ numbers, the better.

Wrap-up.

Lab 16 Feb – Moderation paper critique

Seminar 20 Feb – Special considerations in moderation studies // Moderation in LDV models