I’m a big fan of logit models. Mostly because, in my opinion, they offer the clearest way to answer questions that are usually the most interesting to academics and entrepreneurs. Because we can turn the results of a logit model into a set of predicted probabilities, they let us answer questions like, “what’s the probability of a new business making a first sale given these conditions?” and “what’s the probability of an entrepreneur starting a new business given that he/she started a business before?”

The kicker, as I’ve found teaching logit models, is that while the final outcome is very helpful, getting there isn’t so easy. The path to probability is also a reason why logit and other limited dependent variable models are often misinterpreted in strategy and entrepreneurship research.

So this post walks through a logit model with two predictor variables, \(x1\) and \(x2\), that are both dichotomous (0/1). In a future Part II, I’ll do a logit model where \(x1\) and \(x2\) are both continuous variables.

We’ll start by generating the data with two dichotomous predictors, which I’ll call Model 1.

```
library(tidyverse) # Love the tidyverse!
# Set the random seed generator to ensure reproducible results
set.seed(080203)
# Create a new dataframe with 1,000 observations and our two predictors
model1.df <- data_frame(x1 = rbinom(1000, 1, .5),
x2 = rbinom(1000, 1, .5))
# Next we'll create our predicted value of y
model1.df <- model1.df %>%
mutate(y = (.75*x1 + 1.5*x2)) %>%
mutate(pr_y = exp(y)/(1 + exp(y))) %>%
mutate(y = rbinom(1000, 1, pr_y)) %>%
select(-pr_y)
# Export the data file to a csv for data sharing
write.csv(model1.df, file = "logit_sim_data.csv", row.names = FALSE)
# Now lets take a look
head(model1.df, 10)
```

```
## # A tibble: 10 × 3
## x1 x2 y
## <int> <int> <int>
## 1 1 1 1
## 2 0 0 0
## 3 1 1 1
## 4 0 0 0
## 5 1 1 1
## 6 0 1 1
## 7 0 1 0
## 8 0 0 0
## 9 0 1 0
## 10 1 0 1
```

A logistic regression model is simply a logistic transformation of the linear model, so we can write our model in a familiar form…

\(ln\left[\frac{p}{1-p}\right]=\alpha+{\beta}{x_1}+{\beta}{x_2}+\epsilon\)

Our outcome variable is the probability of \(y\) occurring (y = 1), which we write as \(\left[\frac{p}{1-p}\right]\), the probability of \(y\) occurring divided by the probability of \(y\) not occurring.

We’ll use the `glm`

function to estimate our model in R, and specify the binomial link function, which gives us our logistic transformation.

```
model1 <- glm(y ~ x1 + x2, data = model1.df, family = "binomial")
summary(model1)
```

```
##
## Call:
## glm(formula = y ~ x1 + x2, family = "binomial", data = model1.df)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.1750 -1.2270 0.6248 0.8443 1.1286
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.1160 0.1197 0.969 0.333
## x1 0.7321 0.1566 4.675 2.94e-06 ***
## x2 1.4185 0.1604 8.846 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1137.7 on 999 degrees of freedom
## Residual deviance: 1034.1 on 997 degrees of freedom
## AIC: 1040.1
##
## Number of Fisher Scoring iterations: 4
```

Here is where things get complicated.

Lets take a look at the estimated coefficients…

```
log.odds <- summary(model1)$coefficients[1:3,1]
log.odds
```

```
## (Intercept) x1 x2
## 0.1159635 0.7320719 1.4185449
```

Our \(y\) variable is on the logit scale. That means that the coefficient estimates are the **log odds** of \(y\) occurring given a unit change in the two predictors. The intercept in this case is the log odds of \(y\) occurring when both \(x1\) and \(x2\) are zero.

Log odds are not particularly helpful, so we can convert them from the logit scale by taking the exponent:

```
exp.odds <- exp(log.odds)
exp.odds
```

```
## (Intercept) x1 x2
## 1.122955 2.079384 4.131105
```

The exponented coefficients are not the “effect” of \(x\) on \(y\) as would be the common interpretation of a regular linear regression. They also aren’t the likelihood of \(y\) occurring, which is really a statement of probability (we’ll get to that). Because \(x1\) is dichotomous, we interpret \(x1\) as the expected change, on average, in the odds of \(y\) occurring as \(x1\) shifts from 0 to 1, holding constant \(x2\).

Also because \(x1\) is dichotomous, the coefficient estimate is equivalent to the **marginal effect** of \(x1\) on \(y\), holding constant \(x2\) (more on that later).

So what we can say is that the odds of \(y\) occurring are just over two times higher in the presence of \(x1\) (\(x1\) = 1) then when \(x1\) is not present (\(x1\) = 0), holding \(x2\) constant. To put it another way, the odds of \(y\) occurring given the presence of \(x1\) and holding \(x2\) constant are roughly 2.08 to 1 (2.08:1).

Odds are helpful, but they really are useful when it comes to predicting the probability that \(y\) occurs at discrete values of the two predictors.

In our model, given that both \(x1\) and \(x2\) are dichotomous, we have four different possible probabilities of \(y\) predicted by our model, and I’ve shown each of them with the corresponding equation:

Probability | x1 | x2 | Equation |
---|---|---|---|

Pr_y | 0 | 0 | \(ln\left[\frac{p}{1-p}\right]=\alpha\) |

Pr_y | 1 | 0 | \(ln\left[\frac{p}{1-p}\right]=\alpha+{\beta}{x_1}\) |

Pr_y | 0 | 1 | \(ln\left[\frac{p}{1-p}\right]=\alpha+{\beta}{x_2}\) |

Pr_y | 1 | 1 | \(ln\left[\frac{p}{1-p}\right]=\alpha+{\beta}{x_1}+{\beta}{x_2}\) |

Yes, there is a shortcut to doing this in R, but it’s helpful to walk through how to do this by hand. Lets calculate the second row in the table—the probability that \(y\) occurs when \(x1\) occurs (\(x1\) = 0) and \(x2\) does not occur (\(x2\) = 0). Recall that the equation for this model is \(ln\left[\frac{p}{1-p}\right]=\alpha+{\beta}{x_1}\), or equivalently: \(Pr\_ y=\frac{1}{1+{e}^{-(\alpha+{\beta}{x_{1}})}}\) (see Wooldridge, 2012).

```
model1.intercept <- (summary(model1)$coefficients[1,1])
model1.betax1 <- (summary(model1)$coefficients[2,1])
pr_y = 1 / (1 + (exp(-1 * (model1.intercept + model1.betax1))))
pr_y
```

`## [1] 0.7001548`

On average and for this sample, there is a 70% probability of \(y\) occurring given that \(x1\) occurred **and** \(x2\) did not.

Ok, now for the shortcut. We can use R’s `predict`

function to pass all of the possible combinations of from our table and give us the predicted probabilities of each. Note that we must use the ‘response’ option to return the predicted probabilities, as opposed to the log odds.

```
# We start with a new dataframe that contains our possible...
# ...values of x1 and x2, following our table above
model1.values <- data_frame(x1 = c(0, 1, 0, 1),
x2 = c(0, 0, 1, 1))
model1.values
```

```
## # A tibble: 4 × 2
## x1 x2
## <dbl> <dbl>
## 1 0 0
## 2 1 0
## 3 0 1
## 4 1 1
```

We then pass the new dataframe to the `predict`

function, and we’ll set the `se.fit = TRUE`

option to return the standard errors for the prediction, so that we can construct confidence intervals.

```
model1.predict <- predict(model1, newdata = model1.values, type = "response", se.fit = TRUE)
# Create a new dataframe to store predictions and calculated confidence...
# ...intervals using +/- 1.96 for a 95% interval
model1.predict.df <- data_frame(pr_y = model1.predict$fit,
lower.ci = model1.predict$fit - (1.96 * model1.predict$se.fit),
upper.ci = model1.predict$fit + (1.96 * model1.predict$se.fit))
# Lets also convert these values to percentages to make them...
# ...easier to manage and interpret
model1.predict.df <- model1.predict.df %>%
mutate(pr_y = 100 * round(pr_y, 4),
lower.ci = 100 * round(lower.ci, 4),
upper.ci = 100 * round(upper.ci, 4))
```

Now we can bind our predicted values dataframe to our model values dataframe (the one with the x1 and x2 values) to make a nice output table:

```
library(knitr)
model1.outcome <- bind_cols(model1.values, model1.predict.df)
kable(model1.outcome, caption = "Predicted probabilites of y at possible combinations of x1 and x2")
```

x1 | x2 | pr_y | lower.ci | upper.ci |
---|---|---|---|---|

0 | 0 | 52.90 | 47.05 | 58.74 |

1 | 0 | 70.02 | 64.75 | 75.28 |

0 | 1 | 82.27 | 78.34 | 86.19 |

1 | 1 | 90.61 | 87.86 | 93.35 |

While I’m a big fan of visualizations, there’s not much to plot with these two variables—all we get are point estimates. We’ll tackle plotting predicted probabilities in Logit Part II with continuous variables.

Henry W. Bloch School of Management | University of Missouri - Kansas City

(c) Brian S. Anderson, 2017