Drawing Nomological Conclusions From Observational Data & Meta-Analyses

ENT5587B - Research Design & Theory Testing II

Brian S. Anderson, Ph.D.
Assistant Professor
Department of Global Entrepreneurship & Innovation
andersonbri@umkc.edu

RIEI Logo
© 2017 Brian S. Anderson

How goes it?
IRB Applications
Slack
New presentation format and approach!
Review of causal reasoning
Strengths and weaknesses of meta-analyses in management/entrepreneurship research
Lab 26 Jan: Paper critique exercise

IRB Applications…

Using Slack…

New presentation format…

Welcome to reveal.js!

reveal.js is an R package and related resources to create interactive HTML-based presentations.

It has a number of advantages over Keynote (let alone over PowerPoint!), and it’s going to be my new standard.

For example…

It’s really easy to include equations: \(y=\alpha+{\beta}_{i}{x}_{i}+{\beta}_{j}{x}_{j}+...+{\beta}_{k}{x}_{k}+\varepsilon\)

And inline code is a breeze (great for me!): myVar <- x + y

Code snippets work really well, making it easy to follow-along…

library(readr)
library(tidyverse)
my.ds <- read_csv("http://a.web.umkc.edu/andersonbri/ENT5587.csv")
my.df <- my.ds %>%
  select(Innovativeness, Proactiveness, RiskTaking, SGR) %>%
  na.omit()
head(my.df, 10)

## # A tibble: 10 × 4
##    Innovativeness Proactiveness RiskTaking        SGR
##             <dbl>         <dbl>      <dbl>      <dbl>
## 1        3.000000      5.333334   4.000000  11.863560
## 2        1.000000      1.000000   1.000000  16.627127
## 3        4.000000      2.666667   3.666667   3.054168
## 4        5.333334      4.000000   4.666666 -13.711043
## 5        2.666667      3.666667   2.333333   4.531741
## 6        2.000000      4.333334   3.333333  10.531465
## 7        2.333333      5.000000   4.333334  -2.163424
## 8        3.666667      3.333333   5.000000  -3.864490
## 9        4.666666      5.000000   4.000000   4.332236
## 10       2.000000      5.333334   4.000000 -18.299290

And plots look fantastic!

But the biggest reason is pedagogical.

I’m changing my approach and providing my decks in advance. I want you to practice more with R and experimenting with the code and analyses before class. I also want our classroom discussions to be more interactive, and with focused note-taking.

This is a BIG change for me!

Just a little review about establishing causality…

The three necessary and sufficient requirements…

Temporal sequencing
Non-spurious relationship
Eliminate alternate causes

What is our gold standard again?

And even in a randomized controlled experiment, can we ever be sure that we have perfect random assignment?

So given that we’re mere mortals, what is our best practice recommendation for analyses of data collected in an experiment?

Instruments are ALWAYS our friend, and we will be using them a lot over the course of the term.

We will also spend time talking about how to identify valid instruments, including evaluating the exclusion restriction.

Remember that our goal is to establish ignorability, which according to the Rubin counterfactual framework allows us to ignore alternate explanations for the phenomenon of interest, effectively isolating the focal effect of X on Y.

Under observational designs, we do not have random assignment to the treatment condition, so it’s never appropriate to assume ignorability.

We really don’t know if the effect we’ve observed is the relationship between X and Y or if an unobserved factor, Z, is really behind the result.

We’ll come back to this later, but the material point is that absent steps taken to establish conditional ignorability, the safe bet is that the observed effect in an observational design is biased at best, or spurious at worst.

Why does that matter for a discussion of meta-analyses in the management and entrepreneurship literature?

Because unless the data underlying the meta-analysis comes from a randomized controlled experiment, or unless the effect size has been corrected for potential endogeneity, any meta-analysis using this data is almost certainly wrong.

Snap.

Despite their popularity—meta-analyses tend to be exceptionally well cited—they have very limited utility to our field because of the variability of the underlying studies. Feel free to cite them, because a reviewer will probably ask you to anyway, but I wouldn’t pay all that much attention to them.

Yeah I know, I’m curmudgeony like that.

Lest you think that I’m alone in my criticisms of meta-analyses, take a look at this recent paper arguing that the vast majority of meta-analyses in the medical field are, to put it mildly, decidedly unhelpful to medical practice.

The logic underpinning meta-analysis is, however, quite sound. This also means that a well-done meta-analysis—starting with the decision of which studies should be included in the analysis—is valuable to science.

Lets walk through the basic argument for the value of a meta-analysis…

Lets assume a randomized controlled experiment, of sufficient power, investigating the effect of positive affect on opportunity recognition. The basic hypothesis is that the happier I am, the more likely I am to recognize new opportunities (no, this isn’t a real theory, or at least I don’t think it is!).

In the study, the researchers identified a marginal effect of the treatment on the criterion (\({\beta}\) = .2, p = .05, N = 1,000).

So far so good, but remember that under a frequentist approach, with a p value of .05, there is a 1 in 20 probability that the observed or a larger effect was likely due to random chance.

If this study is the 1 in 20, what type of error is that again?

Now, the fasle positive rate is likely much higher than 1 in 20, and you should spend some time understanding why. But lets assume that we’ve got 100 studies of the effect of affect on opportunity recognition. We know that at least 5% of them will have likely reported a spurious effect.

We also know that Type I errors tend to produce inflated effect sizes, so not only do we have spurious results in our sample of 100 studies, we also likely have some reported effect sizes that are outside of the true 95% confidence interval.

Enter the meta-analysis.

It’s common to think of meta-analyses as studies of studies, but that’s not really accurate. A meta-analysis is really a statistical examination of the reported effect sizes in a set of studies ideally of the SAME treatment on the SAME outcome with the SAME experimental conditions.

We’ll talk more about these assumptions later, but think about meta-analysis as Olympic figure skating in the 1970s-1980s. We know that that the East German judge is biased, the American judge might go the opposite direction, so we’re going to throw those scores out and the net result is closer to the ‘true’ score for the skater.

In a meta-analysis we don’t throw out any data, but there are a number of statistical tools available to adjust the estimated effect size for a variety of different sources of error, ostensibly giving us a better understanding of what the population—true—effect of X on Y actually is.

As you have all found out, conducting original research that maximizes causal inference and that addresses an important knowledge gap is cough, cough quite difficult.

Meta-analysis by comparison is, however, quite a bit simpler. So much so that it is not uncommon to see so-called ‘meta-analysts’ that specialize in the production of meta-analytic reviews of, well, actual scientists’ work.

Don’t be one of those people.

Now, I am being a little contrite here in that you must invest considerable thought into defining the criteria and exclusion rules for which studies will be included in the review. How well defined the criteria—and how well the analyst adhered to the criteria—largely determines the quality of the resulting analysis.

GIGO is a very real thing here.

Along with the strength of the causal inference in the original study comes the notion of study heterogeneity, or variability.

Variability could be a class or two by itself, but the basic issue is that differences in experimental design, measurement, analytic approach, sampling strategy, etc., induce differences between studies.

These differences may be random, may be systematic, and may—or may not—influence the estimate of the population effect size.

Ideally, you would employ a fixed effect specification for your meta-analysis, under the assumption that there is not a statistically significant difference in the parameter estimate as a function of the originating study itself.

Rarely though does this assumption hold, so random effect specifications which explicitly models between-study heterogeneity is most common. There’s also a statistical test for it.

But the actual data analysis part of a meta-analysis is pretty straightforward. There are though some key judgement calls that the analyst has to make and that may materially influence the results. To illustrate, fire up R and lets do an example.

There are several R packages that handle meta-analyses, but we’re going to focus on metafor.

install.packages("metafor")

We’re going to start with the basics—load up the following completely made up data set and lets take a look at what we have.

library(readr)
metaanalysis.ds <- read_csv("http://a.web.umkc.edu/andersonbri/ENT5587MetaData.csv")
head(metaanalysis.ds, 10)  # Lets take a look at the dataset

## # A tibble: 10 × 5
##    `StudyID` YearPub EffectSize    SE PercCollegeDegree
##        <int>   <int>      <dbl> <dbl>             <dbl>
## 1          1    2006       0.33 0.042              0.76
## 2          2    2006       0.39 0.051              0.46
## 3          3    2006       0.50 0.039              0.79
## 4          4    2006       0.39 0.016              0.75
## 5          5    2006       0.41 0.039              0.49
## 6          6    2006       0.31 0.064              0.46
## 7          7    2006       0.21 0.037              0.67
## 8          8    2006       0.11 0.041              0.76
## 9          9    2006       0.45 0.068              0.45
## 10        10    2006       0.20 0.053              0.06

Seriously, that’s all you need for a simple meta-analysis of a single X –> Y relationship. Told you it wasn’t that hard.

Ok, if you were doing this for real you would have additional variables, but really, not that much more.

Now lets do our analyses using the metafor package and walk through the results.

library(metafor)
mymodel.meta <- rma.uni(yi = EffectSize,  # Our focal parameter
                        vi = SE,  # Standard error of the estimate
                        data = metaanalysis.ds,  # Data set
                        control = list(stepadj = .5),  # For convergence
                        method = "ML")  # Estimator
summary(mymodel.meta)

## 
## Random-Effects Model (k = 20; tau^2 estimator: ML)
## 
##   logLik  deviance       AIC       BIC      AICc  
##   7.5945   13.9701  -11.1890   -9.1976  -10.4831  
## 
## tau^2 (estimated amount of total heterogeneity): 0.0069 (SE = 0.0095)
## tau (square root of estimated tau^2 value):      0.0834
## I^2 (total heterogeneity / total variability):   20.60%
## H^2 (total variability / sampling variability):  1.26
## 
## Test for Heterogeneity: 
## Q(df = 19) = 18.1427, p-val = 0.5129
## 
## Model Results:
## 
## estimate       se     zval     pval    ci.lb    ci.ub          
##   0.2821   0.0443   6.3645   <.0001   0.1952   0.3689      *** 
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The \({\tau}\) (tau) parameter is our estimate of the heterogeneity between the samples. The Q statistic provides an estimate of the between-study variance under the null hypothesis that there is no meaningful difference in the variance of the parameter from study to study.

In our model, Q = 18.1427, p = 0.5129, so we would interpret this how?

So our default random effects model was more appropriate for the data, and our estimated population effect size is .2821, p < .001, with a standard error of .0443 and a 95% confidence interval ranging from .3689 to .1952.

So now what?

Well, here’s where there are a lot of judgement calls and options that come in to play, and I’d refer you to the metafor documentation for a good discussion of all of these options.

One of which is investigating potential moderators, which may or may not be present in your data as a variable but are likely there nonetheless because finding significant between-study heterogeneity often indicates that contextual factors are changing the nature of the X –> Y relationship.

Another analysis you see often is a discussion of how many of the original studies fall within the confidence interval estimated by the model.

We can do that programmatically fairly easily…

library(dplyr)
myCI.lb <- mymodel.meta$ci.lb  # Get the lower bound of the c.i.
myCI.ub <- mymodel.meta$ci.ub  # Get the upper bound of the c.i.
myEffectCount.df <- metaanalysis.ds %>%  # Filter results to a new d.f.
                    filter(EffectSize >= myCI.lb) %>%
                    filter(EffectSize <= myCI.ub)
originalStudy.count <- nrow(metaanalysis.ds)  # Get the number of original studies
subsetStudy.count <- nrow(myEffectCount.df)  # Count the number of matches
myPercent <- (subsetStudy.count / originalStudy.count)*100
cat("Percentage of original studies within 95% c.i.: ", myPercent,"%")  # Print it out

## Percentage of original studies within 95% c.i.:  35 %

We can look at this graphically with a forest plot.

par(mar=c(0,1,0,1))  # This just changes the plot's margins
forest(mymodel.meta)

So just a third of the original studies fall within the estimate of the true effect size estimated by our meta-analysis.

What conclusion might you draw from this collection of studies? HINT…take a minute and estimate the range of effect sizes in the data (a histogram might be nice here!).

From my perspective, just like most research designs, meta-analyses done well—well being defined as the method used by the researcher aligns with the assumptions of the estimator—can provide useful insights.

That said, there are more poorly done meta-analyses than there are helpful ones, and the low barrier to conducting—and publishing—meta-analyses don’t help the field much.

I think Bobko and Roth (2008: 115) put it best…“[M]eta-analysis is not necessarily much better than the primary data, thinking, and theory that went into it at the beginning.”

Wrap-up.

Lab 26 Jan – Paper Critique

Seminar 30 Jan – LDV Models