Lab: Multilevel Modeling – 27 April 2017

Here is my solution to the Lab assessment for today, based on our Multilevel modeling seminar series

Deliverable

Simulate a dataset resulting from a metric conjoint experiment using the code I provide in Slack. Assume that you have 100 participants (n = 100), each of whom saw four scenarios of two variables each taking on a value of low (0) and high (1) (N = 400). The dependent variable is continuous, based on a 1-7 Likert-type scale, and is normally distributed. Please use set.seed(080203) for your simulation, and make sure to use clustered standard errors. Return to me a properly formatted and annotated R Notebook in .html form with the following analyses and discussion:

A line-by-line discussion of what each line of code is doing in creating the simulated data
A discussion for why the i0 and i terms should be set to zero in the simulation
A correlation matrix of the pooled predictors, and a discussion of why you observed the value of that you did
Models…
- An OLS model
- A pooled model using the ‘plm’ package
- A random effect model using the ‘plm’ package
- A fixed effect model using the ‘plm’ package
- A random effect model using the ‘lme4’ package
A discussion of the differences you observed across the models
A discussion of why you must use clustered standard errors in conjoint models
A discussion of the implications of your findings for interpreting the results of metric conjoint experiments–just what, specifically, do the coefficient estimates mean?
A discussion of the implications for estimating cross-level interactions in metric conjoint experiments

Generate dataset

We’re going to start by building our basic multilevel structure with subject IDs to denote our \(i's\), and Scenario IDs denoting our \(t's\). Each subject will see four scenarios, according to our deliverable.

library(tidyverse)
# Start with the basic multilevel structure
conjoint.df <- data_frame(SubjectID = 1:100, MinScenario = 1, MaxScenario = 4)
mixed.df <- conjoint.df %>%
  group_by(SubjectID) %>%
    expand(MinScenario, MaxScenario, 
           ScenarioID = full_seq(MinScenario:MaxScenario, 1)) %>%
  select(SubjectID, ScenarioID)

Now we are going to create the conjoint scenarios, and then join (merge) them with the mixed.df dataframe. In a metric conjoint experiment, each subject sees the exact same combination of predictor variables at the exact same values. So, all we need to do is replicate the scenarios across all of our subjects. Note that information on joining (merging) dataframe is, to say the least, enormously helpful.

# Create the conjoint scenarios
library(tidyverse)
ScenarioID = c(1, 2, 3, 4)
x1 = c(0, 1, 0, 1)
x2 = c(0, 0, 1, 1)
scenarios.df = data.frame(ScenarioID, x1, x2)

# Now create the conjoint dataframe itself
conjoint.df <- inner_join(mixed.df, scenarios.df)
head(conjoint.df, 8)

## Source: local data frame [8 x 4]
## Groups: SubjectID [2]
## 
##   SubjectID ScenarioID    x1    x2
##       <int>      <dbl> <dbl> <dbl>
## 1         1          1     0     0
## 2         1          2     1     0
## 3         1          3     0     1
## 4         1          4     1     1
## 5         2          1     0     0
## 6         2          2     1     0
## 7         2          3     0     1
## 8         2          4     1     1

Everything looks good so far.

Generate Disturbance Terms

Now we need to add our two disturbance terms in our panel equation, \(\mu_{i}\) and \(\epsilon_{it}\). Remember \(\mu_{i}\) is unique to each \(i\)—our subjects.. The \(\epsilon_{it}\) is idiosyncratic to the particular observation, and we set a mean of zero and constant variance \(\sigma = 1\) to ensure consistent parameter estimates.

set.seed(08022003)
conjoint.df <- conjoint.df %>%
  group_by(SubjectID) %>%
    mutate(u_i = 0) %>%
  mutate(e_it = rnorm(n(),0,1))

Unlike in our seminar example, I set \(\mu_{i}\) equal to zero. Also not appearing in our example is \(\mu_{i0}\), which we used in the seminar to specify a random slope for each \(\mu_{i}\), although we could just as easily have also set that value equal to zero. The reason is something unique to a conjoint design. Lets explore that with two summary tables of \(x1\)…

conjoint.df %>%
  group_by(SubjectID) %>%
    summarise(mean(x1), sd(x1))

## # A tibble: 100 × 3
##    SubjectID `mean(x1)`  `sd(x1)`
##        <int>      <dbl>     <dbl>
## 1          1        0.5 0.5773503
## 2          2        0.5 0.5773503
## 3          3        0.5 0.5773503
## 4          4        0.5 0.5773503
## 5          5        0.5 0.5773503
## 6          6        0.5 0.5773503
## 7          7        0.5 0.5773503
## 8          8        0.5 0.5773503
## 9          9        0.5 0.5773503
## 10        10        0.5 0.5773503
## # ... with 90 more rows

conjoint.df %>%
  group_by(ScenarioID) %>%
    summarise(mean(x1), sd(x1))

## # A tibble: 4 × 3
##   ScenarioID `mean(x1)` `sd(x1)`
##        <dbl>      <dbl>    <dbl>
## 1          1          0        0
## 2          2          1        0
## 3          3          0        0
## 4          4          1        0

What I want you to pay attention to is the mean and standard deviation of \(x1\) across two ways of looking at the variable. You’ll notice that within subjects (the first table), the mean value of \(x1\) (and yes, the following is the same for both variables—go ahead, try it) is .5 You’ll also notice that across subjects the standard deviation of \(x1\) is exactly the same for each subject. Then look at the scenarios (the second table). Within each scenario, the mean value of \(x1\) is the same as its value in the dataset, and \(x1\) has the exact same standard deviation across panels.

Put together there is no difference in the mean value of \(x1\) within subjects and across subjects, and \(x1\) has the same variance within subjects and within scenarios. Because \(x1\) and \(x2\) take on the exact same values for each subject/scenario combination, each predictor shares exactly the same mean and exactly the same standard deviation (variance) within and, and this is also important, across (or between, it means the same thing) scenarios.

Ok, so is this a good thing? In a metric conjoint design you make an orthogonality assumption. The orthogonality assumption ensures that the subject sees all levels of all factors. The assumption underlying orthogonality is that the factors (\(x's\)) are uncorrelated, which allows for unambiguous interpretation of the effect of each factor on the criterion. So, if you have 2 factors (\(x’s\)) you want to test against a \(y\) criterion, and you have two levels (low | high) for each factor, you end up with 2² = 4 experimental conditions (level^factor). You add a third factor, \(x3\), you would have 2³ = 8 conditions, or keep two factors but add another level (low | medium | high) you get 3² = 9 conditions. The orthogonality assumption is really important to properly comparing the coefficient estimate of a focal \(x\) (\(x1\)) to another \(x\) (say \(x2\)).

We can see this with a pooled correlation matrix of \(x1\) and \(x2\).

cor.df <- subset(conjoint.df, select = c(x1, x2))
cor(cor.df)

##    x1 x2
## x1  1  0
## x2  0  1

This helpful feature of metric conjoint aside, when you get to the math, because all subjects are exposed to the same set of conditions in this experiment, you end up with what we observed above using the summary tables. For each subject (\(i\)) and across all scenarios (\(t\)), the mean value of \(x1\) is .5 and the variance is equivalent Within each profile (t), the mean value of \(x1\) is the value of \(x1\) assigned by the researcher (0 | 1) and the variance is zero, because the value of \(x1\) never changes for any subject (i) within panel (t).

This means that the mean and variance of \(\mu_{i}\) is constant for every \(i\) in the sample. Because it’s constant, it is perfectly colinear across all observations and hence reduces to zero. We will see why that is important in a minute.

Generate Predicted Values

Like our other simulations, we can generate our predicted values of \(y\) based on building our regression equation. Here, I’m using an effect size of .45 for \(x1\) and .85 for \(x2\), and an intercept of 3 for \(y\).

conjoint.df <- conjoint.df %>%
  mutate(y = round(3 + (.45 * x1) + (.85 * x2) + u_i + e_it))

# Lets check the distribution of y...
hist(conjoint.df$y)

Models

So I’m going to make this easy and specify all of my models up front…

library(plm)
library(lme4)

# OLS model
ols.model <- lm(y ~ x1 + x2, data = conjoint.df)

# Pooled effect model - plm
pooled.model <- plm(y ~ x1 + x2, data = conjoint.df, 
                    index=c("SubjectID","ScenarioID"), method = "pooling")

## This series is constant and has been removed: u_i

# Random effect model - plm
random.model <- plm(y ~ x1 + x2, data = conjoint.df, 
                    index=c("SubjectID","ScenarioID"), method = "random")

## This series is constant and has been removed: u_i

# Fixed effect model - plm
fixed.model <- plm(y ~ x1 + x2, data = conjoint.df, 
                   index=c("SubjectID","ScenarioID"), method = "within")

## This series is constant and has been removed: u_i

# Mixed effect model - lme4
mixed.model <- lmer(y ~ x1 + x2 + (1 | SubjectID), data = conjoint.df)

Well, you probably got an error message like the one I got above. Why is that? Well, lets look at the summary output for each model.

OLS

summary(ols.model)

## 
## Call:
## lm(formula = y ~ x1 + x2, data = conjoint.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.4575 -0.7375  0.1075  0.5425  2.2625 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.89250    0.08001  36.151  < 2e-16 ***
## x1           0.56500    0.09239   6.115 2.31e-09 ***
## x2           0.84500    0.09239   9.146  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9239 on 397 degrees of freedom
## Multiple R-squared:  0.2337, Adjusted R-squared:  0.2298 
## F-statistic: 60.52 on 2 and 397 DF,  p-value: < 2.2e-16

Pooled

summary(pooled.model)

## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = y ~ x1 + x2, data = conjoint.df, index = c("SubjectID", 
##     "ScenarioID"), method = "pooling")
## 
## Balanced Panel: n=100, T=4, N=400
## 
## Residuals :
##    Min. 1st Qu.  Median 3rd Qu.    Max. 
##  -2.540  -0.545  -0.045   0.610   2.200 
## 
## Coefficients :
##    Estimate Std. Error t-value  Pr(>|t|)    
## x1 0.565000   0.092309  6.1207 2.929e-09 ***
## x2 0.845000   0.092309  9.1540 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    357.25
## Residual Sum of Squares: 253.92
## R-Squared:      0.28922
## Adj. R-Squared: 0.048322
## F-statistic: 60.6298 on 2 and 298 DF, p-value: < 2.22e-16

Random effect - plm

summary(random.model)

## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = y ~ x1 + x2, data = conjoint.df, index = c("SubjectID", 
##     "ScenarioID"), method = "random")
## 
## Balanced Panel: n=100, T=4, N=400
## 
## Residuals :
##    Min. 1st Qu.  Median 3rd Qu.    Max. 
##  -2.540  -0.545  -0.045   0.610   2.200 
## 
## Coefficients :
##    Estimate Std. Error t-value  Pr(>|t|)    
## x1 0.565000   0.092309  6.1207 2.929e-09 ***
## x2 0.845000   0.092309  9.1540 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    357.25
## Residual Sum of Squares: 253.92
## R-Squared:      0.28922
## Adj. R-Squared: 0.048322
## F-statistic: 60.6298 on 2 and 298 DF, p-value: < 2.22e-16

Fixed effect - plm

summary(fixed.model)

## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = y ~ x1 + x2, data = conjoint.df, index = c("SubjectID", 
##     "ScenarioID"), method = "within")
## 
## Balanced Panel: n=100, T=4, N=400
## 
## Residuals :
##    Min. 1st Qu.  Median 3rd Qu.    Max. 
##  -2.540  -0.545  -0.045   0.610   2.200 
## 
## Coefficients :
##    Estimate Std. Error t-value  Pr(>|t|)    
## x1 0.565000   0.092309  6.1207 2.929e-09 ***
## x2 0.845000   0.092309  9.1540 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    357.25
## Residual Sum of Squares: 253.92
## R-Squared:      0.28922
## Adj. R-Squared: 0.048322
## F-statistic: 60.6298 on 2 and 298 DF, p-value: < 2.22e-16

Random effect - lme4

summary(mixed.model)

## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ x1 + x2 + (1 | SubjectID)
##    Data: conjoint.df
## 
## REML criterion at convergence: 1079
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.6578 -0.7949  0.1153  0.5908  2.4423 
## 
## Random effects:
##  Groups    Name        Variance Std.Dev.
##  SubjectID (Intercept) 0.00149  0.03859 
##  Residual              0.85210  0.92309 
## Number of obs: 400, groups:  SubjectID, 100
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  2.89250    0.08004   36.14
## x1           0.56500    0.09231    6.12
## x2           0.84500    0.09231    9.15
## 
## Correlation of Fixed Effects:
##    (Intr) x1    
## x1 -0.577       
## x2 -0.577  0.000

So here’s the big question, what happened?

For a random effect model to work, there must be variance between \(i's\).

For a fixed effect model to work, there must be variance within \(i\) across \(t\).

In a conjoint, neither of those conditions are met, as we saw with our descriptive tables above. For each \(i\), \(t\) combination, each subject has the exact same value of \(x1\) and \(x2\). The only thing that varies in the data is the observed value of \(y\). That’s it.

If it helps, you can also think of a conjoint like a within-subjects/repeated-measures ANOVA…

anova.model=aov(y ~ (x1 + x2) + Error(SubjectID), data = conjoint.df)
summary(anova.model)

## 
## Error: SubjectID
##           Df   Sum Sq  Mean Sq F value Pr(>F)
## Residuals  1 0.005419 0.005419               
## 
## Error: Within
##            Df Sum Sq Mean Sq F value   Pr(>F)    
## x1          1   31.9   31.92   37.30 2.42e-09 ***
## x2          1   71.4   71.40   83.44  < 2e-16 ***
## Residuals 396  338.9    0.86                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

summary(lm(anova.model))

## 
## Call:
## lm(formula = anova.model)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.4558 -0.7366  0.1053  0.5483  2.2649 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 2.8860606  0.1138663  25.346  < 2e-16 ***
## x1          0.5650000  0.0925054   6.108 2.42e-09 ***
## x2          0.8450000  0.0925054   9.135  < 2e-16 ***
## SubjectID   0.0001275  0.0016023   0.080    0.937    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9251 on 396 degrees of freedom
## Multiple R-squared:  0.2337, Adjusted R-squared:  0.2279 
## F-statistic: 40.25 on 3 and 396 DF,  p-value: < 2.2e-16

Which means, effectively, that a simple OLS model is all that is necessary to estimate a model from a metric conjoint design. In a conjoint, you have effectively removed the multilevel structure of the data by design. This is why I say that conjoint is a multilevel design that isn’t really multilevel.

The effect sizes then are simply the expected change in \(y\) as \(x\) moves from its low condition (or baseline) to its high (or present, etc.) condition. The other nice feature about conjoint analyses is that because the \(x's\) are perfectly uncorrelated and have the same scale of measurement, we can interpret them as weights, as in, \(x2\) yields a larger shift in \(y\) than does \(x1\).

One thing that isn’t commonly done in conjoint discussions is linear comparisons of the coefficient estimates. Ok, \(x2\) looks like it is more than \(x1\), but is it statistically greater? For that we run a test to see if the coefficients are equivalent…

library(car)
linearHypothesis(ols.model, "x1 = x2")

## Linear hypothesis test
## 
## Hypothesis:
## x1 - x2 = 0
## 
## Model 1: restricted model
## Model 2: y ~ x1 + x2
## 
##   Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
## 1    398 342.79                              
## 2    397 338.87  1      3.92 4.5924 0.03272 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

That’s close! Yes, they are different from each other, but only marginally so.

Clustered standard errors

Now all that said about using OLS to analyze conjoint data has nothing to do with consistency of inference. Here, we have a problem unless we do something about it.

Recall that regression makes the assumption that residuals are independently and identically distributed. In practice, this means that each observation is random—there is nothing systematic that connects one observation to the other. The i.i.d. assumption is critical to both consistency of parameters and consistency of inference. In multilevel/repeated measures, we violate that assumption because each observation, \(t\), nests within its \(i\), even in the case of the joint design.

We can see the violation, although in our simulation it is slight, with a Scale-Location plot. We’re looking for a perfectly flat line…

plot(ols.model, which = c(3))

Fortunately R makes it easy to cluster the standard errors by adding the cluster option in our summary output…

summary(ols.model, cluster=c("SubjectID"))

## 
## Call:
## lm(formula = y ~ x1 + x2, data = conjoint.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.4575 -0.7375  0.1075  0.5425  2.2625 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.89250    0.08001  36.151  < 2e-16 ***
## x1           0.56500    0.09239   6.115 2.31e-09 ***
## x2           0.84500    0.09239   9.146  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9239 on 397 degrees of freedom
## Multiple R-squared:  0.2337, Adjusted R-squared:  0.2298 
## F-statistic: 60.52 on 2 and 397 DF,  p-value: < 2.2e-16

Keep in mind that the standard error for x1 and x2 will be the same because each variable has the exact same mean, variance, and number of observations.

Cross-level interactions

One of the arguments in favor of conjoint/mixed effect modeling is that it allows you to estimate interaction effects from the higher level to the lower. So, for example, if our \(i's\) are managers, and out \(t's\) are decisions, we can specify an interaction effect between the manager’s age and the lower order decisions.

In a conjoint however, given that we’ve effectively removed the multilevel structure, you can estimate the interaction effect with simply the pooled OLS model. In this case, the moderator is changing the nature of the pooled, average effect of \(x\) on \(y\).

Lets add a continuous moderator to our dataset…

set.seed(08022003)
conjoint.df <- conjoint.df %>%
  group_by(SubjectID) %>%
    mutate(m = rnorm(length(SubjectID),0,1)[1])
head(conjoint.df, 10)

## Source: local data frame [10 x 8]
## Groups: SubjectID [3]
## 
##    SubjectID ScenarioID    x1    x2   u_i        e_it     y          m
##        <int>      <dbl> <dbl> <dbl> <dbl>       <dbl> <dbl>      <dbl>
## 1          1          1     0     0     0 -0.91727454     2 -0.9172745
## 2          1          2     1     0     0  0.25299321     4 -0.9172745
## 3          1          3     0     1     0  0.02692579     4 -0.9172745
## 4          1          4     1     1     0  0.12997585     4 -0.9172745
## 5          2          1     0     0     0  0.32177822     3  0.3217782
## 6          2          2     1     0     0  0.24869389     4  0.3217782
## 7          2          3     0     1     0  0.84257419     5  0.3217782
## 8          2          4     1     1     0 -0.00472125     4  0.3217782
## 9          3          1     0     0     0 -1.10891749     2 -1.1089175
## 10         3          2     1     0     0  0.38668977     4 -1.1089175

Now lets specify an interaction model with our pooled OLS model…

interaction.model <- lm(y ~ x1*m + x2, data = conjoint.df)
summary(interaction.model)

## 
## Call:
## lm(formula = y ~ x1 * m + x2, data = conjoint.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.5432 -0.4550 -0.1266  0.5632  2.1231 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.92801    0.07340  39.891  < 2e-16 ***
## x1           0.53058    0.08482   6.255 1.03e-09 ***
## m            0.55580    0.06290   8.837  < 2e-16 ***
## x2           0.84500    0.08463   9.985  < 2e-16 ***
## x1:m        -0.53873    0.08895  -6.057 3.24e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8463 on 395 degrees of freedom
## Multiple R-squared:  0.3603, Adjusted R-squared:  0.3538 
## F-statistic: 55.61 on 4 and 395 DF,  p-value: < 2.2e-16

We get the same result by the way if we use a mixed effect model…

int.mixed.model <- lmer(y ~ x1*m + x2 + (1 | SubjectID), data = conjoint.df)
summary(int.mixed.model)

## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ x1 * m + x2 + (1 | SubjectID)
##    Data: conjoint.df
## 
## REML criterion at convergence: 1014.7
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.0052 -0.5376 -0.1496  0.6655  2.5088 
## 
## Random effects:
##  Groups    Name        Variance Std.Dev.
##  SubjectID (Intercept) 0.0000   0.0000  
##  Residual              0.7162   0.8463  
## Number of obs: 400, groups:  SubjectID, 100
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  2.92801    0.07340   39.89
## x1           0.53058    0.08482    6.26
## m            0.55580    0.06290    8.84
## x2           0.84500    0.08463    9.98
## x1:m        -0.53873    0.08895   -6.06
## 
## Correlation of Fixed Effects:
##      (Intr) x1     m      x2    
## x1   -0.578                     
## m     0.055 -0.047              
## x2   -0.576  0.000  0.000       
## x1:m -0.039  0.067 -0.707  0.000

Bottom line? Even with a panel invariant (\(i\)-level) moderator, we don’t need to use a multilevel/mixed effect estimator. We certainly can, but we don’t have to.

Final thoughts

So what’s the moral of our story? Well, I’m a big fan of conjoint studies. That said, most of the time the empirics from these papers are not accurate. Now, the parameter estimates are not wrong, assuming that there are no omitted variables (big assumption!), but there is no need to use a multi/mixed effect approach to estimate these models. Don’t forget though to cluster those standard errors!

Brian S. Anderson, Ph.D. | Assistant Professor of Entrepreneurship | andersonbri@umkc.edu
Henry W. Bloch School of Management | University of Missouri - Kansas City
entrepreneurship.bloch.umkc.edu
(c) Brian S. Anderson, 2017