In Part I, I outlined interpreting logistic regression models with two dichotomous predictors. In this post, we’ll tackle logistic regression with two continuous predictors.

We’ll start by generating the data with two dichotomous predictors, and we’ll call this Model 2.

```
library(tidyverse) # Love the tidyverse!
# Set the random seed generator to ensure reproducible results
set.seed(080203)
# Create a new dataframe with 1,000 observations and our two normally...
# ...distributed predictors.
model2.df <- data_frame(x1 = rnorm(1000),
x2 = rnorm(1000),
e_i = rnorm(1000))
# Next we'll create our predicted value of y and include a random...
# ...disturbance term
model2.df <- model2.df %>%
mutate(y = (.75*x1 + 1.5*x2 + e_i)) %>%
mutate(pr_y = exp(y)/(1 + exp(y))) %>%
mutate(y = rbinom(1000, 1, pr_y)) %>%
select(-pr_y, -e_i)
# Now lets take a look
head(model2.df, 10)
```

```
## # A tibble: 10 x 3
## x1 x2 y
## <dbl> <dbl> <int>
## 1 0.9054952 -0.8219114 0
## 2 1.5866302 0.2418754 1
## 3 0.1654336 0.3295136 1
## 4 -0.5190593 2.2521170 1
## 5 -1.0471148 1.5826300 1
## 6 -0.1464832 -0.2107056 0
## 7 -1.3642726 -0.4808994 0
## 8 0.3521542 -0.7726017 1
## 9 -0.5077717 -0.5513062 0
## 10 -1.7676480 0.3960072 0
```

Just to review, a logistic regression model is simply a logistic transformation of the linear model, so we can write our model in a familiar form…

\(ln\left[\frac{p}{1-p}\right]=\alpha+{\beta}{x_1}+{\beta}{x_2}+\epsilon\)

Our outcome variable is the probability of \(y\) occurring (y = 1), which we write as \(\left[\frac{p}{1-p}\right]\), the probability of \(y\) occurring divided by the probability of \(y\) not occurring.

We’ll use the `glm`

function to estimate our model…

```
model2 <- glm(y ~ x1 + x2, data = model2.df, family = "binomial")
summary(model2)
```

```
##
## Call:
## glm(formula = y ~ x1 + x2, family = "binomial", data = model2.df)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.3476 -0.8569 0.2387 0.8389 2.9860
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.02863 0.07640 -0.375 0.708
## x1 0.82535 0.08809 9.369 <2e-16 ***
## x2 1.41638 0.10210 13.873 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1386.1 on 999 degrees of freedom
## Residual deviance: 1030.2 on 997 degrees of freedom
## AIC: 1036.2
##
## Number of Fisher Scoring iterations: 4
```

Once again, here is where things get complicated.

Lets take a look at the estimated coefficients…

```
log.odds <- summary(model2)$coefficients[1:3,1]
log.odds
```

```
## (Intercept) x1 x2
## -0.02862927 0.82535153 1.41638390
```

Just as we Model 1 from the last post, the \(y\) variable is on the logit scale. That means that the coefficient estimates are the **log odds** of \(y\) occurring given a unit change in the respective predictor. The intercept in this case is the log odds of \(y\) occurring when both \(x1\) and \(x2\) are zero. Because both our \(x1\) and \(x2\) are normally distributed with a mean of (close to) zero and standard deviation of (close to) one, the intercept is effectively the log odds of \(y\) occurring at roughly the mean value of \(x1\) and \(x2\).

Once again we can convert the log odds to the odds by taking the exponent:

```
exp.odds <- exp(log.odds)
exp.odds
```

```
## (Intercept) x1 x2
## 0.9717767 2.2826831 4.1221872
```

Importantly, the odds shown here are the “average” odds across the range of values of both predictors. That makes them only marginally useful. We can interpret, for example, the “average” odds of \(y\) occurring as 2.28:1, holding constant the effect of \(x2\) at its mean value, which is basically zero in this model. As we will see, the odds—and hence probability—of \(y\) occurring vary dramatically as a function of the value of the predictor.

Side note—with odds you are looking for statistically significant deviations from 1.0. Consider the intercept in the model. There is a 0.97:1 chance of \(y\) occurring when \(x1\) and \(x2\) equal zero (at the mean in the case of our data). The estimate is not statistically significant, which makes sense—1:1 odds is a 50% chance of \(y\) = 1, so it’s just even money, or a coin flip.

The most important thing to remember when working with logit models is that they are *inherently nonlinear*. This means that we cannot express a change in the outcome variable as a function of a linear change in a vector of predictor variables. To put another way, as \(x\) increases/decreases in a nonlinear model, we do not expect a consistent change in the \(y\) variable. In practice, this means that the “average” log odds/odds have very little utility in a logit model. What we really want to see is how the odds itself changes as a \(x\) changes. This is where **marginal effects** come in to play.

The marginal effect in a logit model is the *instantaneous* expected change in the probability of \(y\) occurring for a given unit change in \(x\), assuming \(x\) is a continuous measure. You can think of the marginal effect intuitively as:

\(ME=\frac{\hat{Pr}\_{y}_{1}-\hat{Pr}\_{y}_{2}}{\Delta{x}}\)

Lets walk through the logic. We’ve got our first predicted probability (\(\hat{Pr}\_{y}_{1}\)), when \(x\) took on a particular value. Now we increment \(x\) by some value—could be large, could be small. We then subtract the two probabilities, and divide by the change in \(x\). This gives us the expected change in \(y\) for a given change in \(x\).

In a simple linear regression, \(y=\alpha+\beta{x}+\epsilon\), the estimate of \(\beta{x}\) is the marginal effect—the change in \(y\) is constant as \(x\) increments by some unit. But in a nonlinear model, the marginal effect varies between different changes in \(x\).

We’ll do this first with the `predict`

function, and then do it by hand.

```
ave.model2.predict <- predict(model2, newdata = data.frame(x1 = 0, x2 = 0),
type = "response")
ave.model2.predict
```

```
## 1
## 0.4928432
```

As in Part I, we get this same “average” probability by calculating the predicted probability of \(y\) occurring at the estimated intercept. The intercept is the log odds of \(y\) occurring when \(x1\) and \(x2\) equal zero, which happens to be very close to their mean value in our model.

```
pr_y = 1 / (1 + (exp(-1 * (summary(model2)$coefficients[1,1]))))
pr_y
```

`## [1] 0.4928432`

Now lets increment \(x1\) by a very small amount, say .0001, but then hold \(x\) at its mean value, 0. Then we’ll construct a new predicted probability.

```
delta_x1 <- .0001
inc.model2.predict <- predict(model2, newdata = data.frame(x1 = delta_x1, x2 = 0),
type = "response")
inc.model2.predict
```

```
## 1
## 0.4928638
```

Now from our formula earlier, we’ll take the differences between the two predicted probabilities, and divide them by the change in \(x1\).

```
ame.hand <- (inc.model2.predict - ave.model2.predict)/delta_x1
ame.hand
```

```
## 1
## 0.2062957
```

Because we used the mean value of \(x1\) in the preceding, and used a small change in \(x1\), the marginal effect we calculated by hand is going to be exceptionally close to the marginal effect calculated with calculus, using R’s `margins`

package.

```
library(margins)
# We're setting `margins` to give us the average marginal effect...
# ...when x1 = 0 and x2 = 0; there mean values in our data
ame.margins <- margins(model2, at = list(x1 = 0, x2 = 0))
ame.margins
```

```
## at(x1) at(x2) x1 x2
## 0 0 0.2063 0.354
```

As before though, the average marginal effect doesn’t help us much. Practically speaking, it would be the expected change in the probability of \(y\) occurring as we moved from the mean of \(x1\) and \(x2\) slightly away from the mean. To get a better handle on how a change in \(x1\) changes the probability of \(y\) occurring, we want to look at a range of changes in the value of \(x1\).

Because \(x1\) is continuous, there are, unfortunately, an infinite number of ways to increment \(x1\). This is where theory, and a good understanding the measure, come in to play. In our data, \(x1\) is normally distributed with a standard deviation close to 1.0. In that case, I’ll increment \(x1\) in tenth of a standard deviation changes (.1), starting from where \(x1\) begins to where \(x1\) ends.

As an aside, if \(x1\) was, say, a Likert-type measure from 1-5 or 1-7, an appropriate increment would be 1, which would be the change in the probability of \(y\) occurring as the respondent moved from 1 to 2 to 3 to 4 and so on. Pretty cool, huh!

We’re going to make use of the `margins`

pacakge again for this—it makes life a lot easier. Also, because each marginal effect is itself a point estimate with a standard error, we can see whether the change in the value of \(\beta_{x1}\) is itself statistically significant. We can do that by calling the `summary`

command for `margins`

:

```
library(knitr)
# Setup a new vector of x1 values starting from the lowest ovbserved...
# value to the highest, incrementing along the way by .1
x1.inc <- seq(min(model2.df$x1), max(model2.df$x1), by = .1)
# Now we store the results of the summary function for margins as...
# ...a new dataframe, which makes it easier to manipulate
xi.margins.df <- as.data.frame(summary(margins(model2, at = list(x1 = x1.inc, x2 = 0))))
# Now we're going to filter the dataframe to focus just on the x1 marginal effects
xi.margins.df <- xi.margins.df %>%
filter(factor == "x1")
# Here's what it looks like!
kable(head(xi.margins.df, 10))
```

factor | x1 | x2 | AME | SE | z | p | lower | upper |
---|---|---|---|---|---|---|---|---|

x1 | -3.557843 | 0 | 0.0384812 | 0.0073825 | 5.212478 | 2e-07 | 0.0240117 | 0.0529506 |

x1 | -3.457843 | 0 | 0.0414416 | 0.0075400 | 5.496251 | 0e+00 | 0.0266635 | 0.0562197 |

x1 | -3.357843 | 0 | 0.0445993 | 0.0076729 | 5.812546 | 0e+00 | 0.0295606 | 0.0596379 |

x1 | -3.257843 | 0 | 0.0479622 | 0.0077774 | 6.166843 | 0e+00 | 0.0327187 | 0.0632057 |

x1 | -3.157843 | 0 | 0.0515379 | 0.0078495 | 6.565781 | 0e+00 | 0.0361532 | 0.0669225 |

x1 | -3.057843 | 0 | 0.0553330 | 0.0078851 | 7.017402 | 0e+00 | 0.0398785 | 0.0707875 |

x1 | -2.957843 | 0 | 0.0593533 | 0.0078806 | 7.531531 | 0e+00 | 0.0439075 | 0.0747991 |

x1 | -2.857843 | 0 | 0.0636034 | 0.0078328 | 8.120135 | 0e+00 | 0.0482514 | 0.0789554 |

x1 | -2.757843 | 0 | 0.0680863 | 0.0077392 | 8.797623 | 0e+00 | 0.0529178 | 0.0832548 |

x1 | -2.657843 | 0 | 0.0728032 | 0.0075986 | 9.581162 | 0e+00 | 0.0579103 | 0.0876962 |

Yeah, so that’s good, but not exactly easy to make sense of. The best way to get a handle on marginal effects is to visualize them, along with a confidence interval to better understand the range in which the estimate of \(\beta_{x1}\) changes as a function of a change in \(x1\).

There are several packages out there that do this, but here’s one option using `ggplot2`

and our `xi.margins.df`

dataframe we just created:

```
library(ggplot2)
library(ggthemes)
ame.plot <- ggplot(xi.margins.df, aes(y = AME, x = x1)) +
geom_line() +
geom_ribbon(aes(ymin = lower, ymax = upper), alpha = .2) +
labs(x = "x1",
y = "Average Marginal Effect of x1")
ame.plot + theme_grey()
```

So how do we make sense of the plot? Well, the first thing to notice is that the marginal effect never crosses zero (on the y-axis, not the x1 axis). This tells us that there is always a statistically significant change in the expected probability of \(y\) occurring as \(x1\) itself changes.

So that’s important, but we also see that this change is substantially larger as \(x1\) approaches zero, which is its mean value. Here we see very dramatic changes in the probability of \(y\) occurring as \(x1\) changes. To put another way, when \(x1\) changes from, say, 2 to 2.2, each of those changes will produce about a 12-10% change in the probability of \(y\) occurring. But when \(x1\) changes from 0 to .2, each of those changes will produce more than a 20% change in the probability of \(y\). You get more *bang for the buck* so to speak in changes to the probability of \(y\) when \(x1\) is closer to its mean than you do when \(x1\) is closer to its tails.

When you compare this plot with the *Average Marginal Effect* at the mean value of \(x1\) that we calculated earlier, it’s easy to see that the “average” really doesn’t give us a good understanding of the underlying relationship between \(x1\) and \(y\).

Side note—again this is just the marginal effect of \(x1\) holding \(x2\) constant at its mean value, which in our case happens to be zero. To explore the marginal effect of \(x2\), simply do to \(x2\) what we did to \(x1\), and then hold \(x1\) constant at its mean.

Another way to see how the probability of \(y\) changes as a function of the level of \(x1\) is by plotting the predicted probabilities, along with a confidence interval (always with a confidence interval!). Again there are built in functions—the `margins`

package has one—but I like doing it by hand.

We’re going to start by calling the `predict`

function, and get the standard errors at the same time. Like we did with the `margins`

command, we’re going to ask our function to use the same range of \(x1\) that we used for the marginal effects—the range of \(x1\) in the data, incremented by .1.

```
# BTW, type = "response" ensures that we get predicted probabilities in a...
# normal scale, instead of the log probabilities!
x1.model2.predict <- predict(model2, newdata = data.frame(x1 = x1.inc, x2 = 0),
type = "response", se = TRUE)
# Next we're going to store our results in a dataframe, to calculate...
# ...our confidence intervals.
x1.model2.predict <- as_data_frame(x1.model2.predict) %>%
mutate(PredProb = fit,
lower.ci = (PredProb - (1.96 * se.fit)),
upper.ci = (PredProb + (1.96 * se.fit))) %>%
select(-fit:-residual.scale) # Just removing unnecessary variables
# Now we need to bring our values of x1 back into our predicted dataframe...
# ...so we know which probability goes to which x1!
x1.model2.predict$x1 = x1.inc
kable(head(x1.model2.predict, 10))
```

PredProb | lower.ci | upper.ci | x1 |
---|---|---|---|

0.0490277 | 0.0192426 | 0.0788128 | -3.557843 |

0.0530222 | 0.0217878 | 0.0842565 | -3.457843 |

0.0573226 | 0.0246131 | 0.0900320 | -3.357843 |

0.0619489 | 0.0277444 | 0.0961534 | -3.257843 |

0.0669221 | 0.0312093 | 0.1026349 | -3.157843 |

0.0722638 | 0.0350371 | 0.1094904 | -3.057843 |

0.0779962 | 0.0392591 | 0.1167333 | -2.957843 |

0.0841421 | 0.0439077 | 0.1243765 | -2.857843 |

0.0907246 | 0.0490170 | 0.1324322 | -2.757843 |

0.0977672 | 0.0546226 | 0.1409117 | -2.657843 |

Tables are nice, but pictures are better. So lets plot our predicted probabilities using `ggplot2`

:

```
pred.plot <- ggplot(x1.model2.predict, aes(y = PredProb, x = x1)) +
geom_line() +
geom_ribbon(aes(ymin = lower.ci, ymax = upper.ci), alpha = .2) +
labs(x = "x1",
y = "Predicted Probability of y Occuring")
pred.plot + theme_grey()
```