Data Science for Linguists

Multilevel models

Joseph V. Casillas, PhD

Rutgers UniversitySpring 2025
Last update: 2025-04-18

@chelseaparlettpelleriti

The model with 1000 names #statsTikTok 🙃

♬ Say So (feat. Nicki Minaj) - Doja Cat / Nicki Minaj

Next steps

What we’ve seen

  • MRC
  • Linear regression
  • General linear model
  • Generalized linear model

What about repeated measures designs?

  • Whenever we have more than one data point from the same participant we are dealing with a type of repeated measures design
    • between subjects factor
    • within subjects factor
  • Everything we have done this semester has been under the assumption that we have one data point per participant (i.e., no within subjects factors)

  • This is because one of the assumptions of our models was that there was no autocorrelation, i.e., that the data were independent

  • Repeated measures designs introduce autocorrelation into the model. Why?

What’s the big deal?

  • Disregarding lack of independence = pseudo-replication

  • In other words, replicating the data as though they were independent when they aren’t

  • This will inflate your degrees of freedom, completely bias your parameter estimates and make your p-values meaningless

What does a repeated measures design look like?

  • This dataset has the weight of 50 different chicks at 12 different time points
  • Notice that the first 12 rows come from the same chick (chick 1)
  • In other words, Time can be thought of as a continuous time series variable (within-subjects) that would violate our assumptions of independence
  • Diet, on the other hand, is a between-subjects factor (you can’t be on more than one diet at a time)
  • Notice that both Time and Diet appear in this dataset as numeric values.
  • You have to think about your data to make sure you understand what type of variables you have.
weight Time Chick Diet
42 0 1 1
51 2 1 1
59 4 1 1
64 6 1 1
76 8 1 1
93 10 1 1
106 12 1 1
125 14 1 1
149 16 1 1
171 18 1 1
199 20 1 1
205 21 1 1
40 0 2 1
49 2 2 1
58 4 2 1
72 6 2 1
84 8 2 1
103 10 2 1
122 12 2 1
138 14 2 1

What can we do?

  • Currently, the most popular solution is to use a multi-level or hierarchical model
  • These models are commonly referred to as mixed effects models
  • They include both “fixed” effects and “random” effects
  • They provide a flexible framework that allows us to account for the hierarchical structure of our data (within-subjects variables, time-series data, etc.) and provide partial pooling estimates
  • Building on our knowledge of the linear model, understanding the basic idea of mixed effects models is trivial

How does it work?

Standard linear model

  • Recall our formula for fitting a linear model

\[ \begin{align} y_{i} & \sim Normal(\mu_{i}, \sigma) \\ \mu_{i} & = \alpha + \beta_{1} x_{i} \\ \sigma & \sim Normal(0, \sigma^{2}) \end{align} \]

  • We fit these models in R using lm() or glm():
    lm(criterion ~ predictor, data = my_data)
  • In a mixed effects model, what we know as predictors are called fixed effects
  • The novel aspect of a mixed effects model is that it also includes random effects
  • Random effects allow us to account for non-independence

How does it work?

Mixed effects model

  • A mixed effects model mixes both fixed effects and random effects

\[ \hat{y} = \alpha + \color{red}{\beta}\color{blue}{X} + \color{green}{u}\color{purple}{Z} + \epsilon \]

\[ response = intercept + \color{red}{slope} \times \color{blue}{FE} + \color{green}{u} \times \color{purple}{RE} + error \]

  • We fit these models in R using lmer() or glmer() from the lme4 package (there are other options as well)
lmer(criterion ~ fixed_effect + (1|random_effect), data = my_data)

What is a random effect?

  • This is actually a really nuanced question and nobody has a good answer
  • Some people use descriptions like the following:

fixed random
repeatable non repeatable
systematic influence random influence
exhaust the pop. sample the pop.
generally of interest often not of interest
continuous or categorical have to be categorical

  • …but you will always find exceptions to this
  • Most of the time you will see subjects and items as random effects (things that are repeated in the design)
  • This typically implies that the model includes random intercepts, and/or random slopes
  • Instead of trying to define random effects, let’s try to understand what they do (and come back to the idea later)

subjects time response
p_01 1 20.00
p_01 2 21.18
p_01 3 18.93
p_01 4 8.55
p_01 5 26.20
p_01 6 24.15
p_01 7 25.55
p_01 8 19.66
p_01 9 27.61
p_01 10 34.99
p_01 11 42.63
p_01 12 35.87
p_01 13 40.81
p_01 14 40.16
p_01 15 34.22
p_01 16 41.99
p_01 17 44.12
p_01 18 48.13
p_01 19 57.08
p_01 20 57.06
p_02 1 23.22
p_02 2 24.68
p_02 3 32.39
p_02 4 33.54


  • The dataframe includes values of the response variable at 20 different time points for each participant (n = 10)
  • We could model this as:
lm(response ~ time, data = my_df)

\[ \begin{align} response_{i} & \sim N(\mu_{i}, \sigma) \\ \mu_{i} & = \alpha + \beta_1 time_{i} \\ \sigma & \sim Normal(0, \sigma^{2}) \end{align} \]

  • But this would violate our assumption of independence (and would produce a terribly misspecified model)

Let’s include subjects as a random effect


  • We will do this by giving subjects a random intercept
  • In other words, we will allow the intercepts to vary for each participant (as opposed to modeling just 1 intercept)
lmer(response ~ time + (1|subjects), data = my_df)
  • What would this look like?


\[ \begin{align} response_{it} & \sim N(\mu_{it}, \sigma) \\ \mu_{it} & = \alpha + \alpha_{subject_i} + \beta_1 time_{t} \\ \sigma & \sim Normal(0, \sigma^{2}) \end{align} \]

What did we do?

  • We are taking into account the idiosyncratic differences associated with each individual subject
  • By giving each subject its own intercept we are informing the model that each individual has a different starting point in the time course (when time = 0)
  • In general this makes sense because some people…
    • are faster/slower responders
    • speak faster/slower
    • have higher/lower pitched voices

  • We can also take into account the fact that some people…
    • slow down during an experiment
    • respond more/less accurately over time
    • learn at different rates

What did we do?

  • The model now allows the random intercepts to vary for each individual for the effect time
  • This means we included a random slope for each participant
  • By adding a random slope for the effect time we take into account the fact that response change for each individual at a different rate
  • Under the hood, the model uses this information to calculate the best fit line for all of the data
  • This method is called partial pooling and represents one of the most important (and least understood) aspects of mixed effects modeling

lmer(response ~ time + (1 + time|subjects), data = my_df)
  • Again, (1 + time|subjects) represents the random structure of the model
  • Anything to the right of | is a random intercept
  • Anything to the left of | is given a random slope for the effect specified to the right
  • Thus, (1 + time|subjects) means random slopes for the effect time for each subject
  • It is rarely a good idea to only use random intercepts



Great news!

  • We have spent the entire semester building up our knowledge of the linear model so that we would be prepared to understand mixed effects models
  • Why? They are the standard in speech sciences now

  • Everything that you have learned in this class applies to a mixed effects model
    • model interpretation
    • nested model comparisons
    • treatment of categorical factors
    • centering, standardizing, and other transformations

  • What about GLMs?
    • Them too!
    • You can fit mixed effects models for count data and binary outcomes using glmer()
    • All you have to do is specify the distribution and the linking function

Multilevel logistic regession model

glmer(
  formula = response ~ fixed_effect + 
    (1 + fixed_effect | participant), 
  family = binomial(link = "logit"), 
  data = DATA
)


Multilevel poisson regression model

glmer(
  formula = counts   ~ fixed_effect + 
    (1 + fixed_effect | participant), 
  family = poisson(link = "log"), 
  data = DATA
)

Understanding partial pooling

Based on Bates (2011) and Mahr (2017) with some help from McElreath (2015)

What do you mean by ‘multiple levels’?

We can think of multilevel models as “models within models”


Generally:

  • One level estimates observed groups/items/individuals/etc.
  • Another level estimates populations of groups/individuals

For this reason, some reasercheres (myself included) prefer to refer to grouping-level effects and population-level effects, as opposed to random effects and fixed effects


Under this view, random intercepts and random slopes are conceptualized and varying intercepts and varying slopes

So what is ‘pooling’ all about?

  • Some describe multilevel models as ‘models with memory’
    • The population-level model helps inform/learn about the grouping-level
    • They learn faster, better
    • They are robust to overfitting
  • It has to do with how information is ‘pooled’
    1. Complete pooling
    2. No pooling
    3. Partial pooling

Partial pooling

sleepstudy dataset

  • Part of lme4 package
  • Criterion = reaction time (Reaction)
  • Predictor = # of days of sleep deprivation (Days)
  • 10 observations per participant
  • +2 fake participants w/ incomplete data
tibble [183 × 3] (S3: tbl_df/tbl/data.frame)
 $ Reaction: num [1:183] 250 259 251 321 357 ...
 $ Days    : num [1:183] 0 1 2 3 4 5 6 7 8 9 ...
 $ Subject : chr [1:183] "308" "308" "308" "308" ...
Reaction Days Subject
249.5600 0 308
258.7047 1 308
250.8006 2 308
321.4398 3 308
356.8519 4 308
414.6901 5 308
382.2038 6 308
290.1486 7 308
430.5853 8 308
466.3535 9 308
222.7339 0 309
205.2658 1 309

No pooling

Model Subject Intercept Slope_Days
No pooling 308 244.1927 21.764702
No pooling 309 205.0549 2.261785
No pooling 310 203.4842 6.114899
No pooling 330 289.6851 3.008073
No pooling 331 285.7390 5.266019
No pooling 332 264.2516 9.566768
No pooling 333 275.0191 9.142046
No pooling 334 240.1629 12.253141
No pooling 335 263.0347 -2.881034
No pooling 337 290.1041 19.025974
No pooling 349 215.1118 13.493933
No pooling 350 225.8346 19.504017
No pooling 351 261.1470 6.433498
No pooling 352 276.3721 13.566549
No pooling 369 254.9681 11.348109
No pooling 370 210.4491 18.056151
No pooling 371 253.6360 9.188445
No pooling 372 267.0448 11.298073
No pooling 374 286.0000 2.000000

Complete pooling


Call:
lm(formula = Reaction ~ Days, data = df_sleep)

Residuals:
     Min       1Q   Median       3Q      Max 
-110.646  -27.951    1.829   26.388  139.875 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  252.321      6.406  39.389  < 2e-16 ***
Days          10.328      1.210   8.537 5.48e-15 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 47.43 on 181 degrees of freedom
Multiple R-squared:  0.2871,    Adjusted R-squared:  0.2831 
F-statistic: 72.88 on 1 and 181 DF,  p-value: 5.484e-15

Complete pooling

Model Subject Intercept Slope_Days
Complete pooling 308 252.3207 10.32766
Complete pooling 309 252.3207 10.32766
Complete pooling 310 252.3207 10.32766
Complete pooling 330 252.3207 10.32766
Complete pooling 331 252.3207 10.32766
Complete pooling 332 252.3207 10.32766
Complete pooling 333 252.3207 10.32766
Complete pooling 334 252.3207 10.32766
Complete pooling 335 252.3207 10.32766
Complete pooling 337 252.3207 10.32766
Complete pooling 349 252.3207 10.32766
Complete pooling 350 252.3207 10.32766
Complete pooling 351 252.3207 10.32766
Complete pooling 352 252.3207 10.32766
Complete pooling 369 252.3207 10.32766
Complete pooling 370 252.3207 10.32766
Complete pooling 371 252.3207 10.32766
Complete pooling 372 252.3207 10.32766
Complete pooling 374 252.3207 10.32766
Complete pooling 373 252.3207 10.32766

Can we improve estimates with partial pooling?

Reaction ~ 1 + Days + (1 + Days | Subject)


  • We allow the effect of Days to vary for each Subject
  • By-subject random intercepts with random slope for Days
lmer(Reaction ~ 1 + Days + (1 + Days | Subject), data = df_sleep)

Linear mixed model fit by REML ['lmerMod']
Formula: Reaction ~ 1 + Days + (1 + Days | Subject)
   Data: df_sleep

REML criterion at convergence: 1771.4

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.9707 -0.4703  0.0276  0.4594  5.2009 

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 Subject  (Intercept) 582.72   24.140       
          Days         35.03    5.919   0.07
 Residual             649.36   25.483       
Number of obs: 183, groups:  Subject, 20

Fixed effects:
            Estimate Std. Error t value
(Intercept)  252.543      6.433  39.257
Days          10.452      1.542   6.778

Correlation of Fixed Effects:
     (Intr)
Days -0.137
  • Most of this will look familiar
  • Fixed effects estimates (and interpretations) work just like lm, glm, etc.
  • What’s new? The random effects estimates

Partial pooling

Subject Intercept Slope_Days Model
308 253.9478 19.6264337 Partial pooling
309 211.7331 1.7319161 Partial pooling
310 213.1582 4.9061511 Partial pooling
330 275.1425 5.6436007 Partial pooling
331 273.7286 7.3862730 Partial pooling
332 260.6504 10.1632571 Partial pooling
333 268.3683 10.2246059 Partial pooling
334 244.5524 11.4837802 Partial pooling
335 251.3702 -0.3355788 Partial pooling
337 286.2319 19.1090424 Partial pooling
349 226.7663 11.5531844 Partial pooling
350 238.7807 17.0156827 Partial pooling
351 256.2344 7.4119456 Partial pooling
352 272.3511 13.9920878 Partial pooling
369 254.9484 11.2985770 Partial pooling
370 226.3701 15.2027877 Partial pooling
371 252.5051 9.4335409 Partial pooling
372 263.8916 11.7253429 Partial pooling
373 248.9753 10.3915288 Partial pooling
374 271.1450 11.0782516 Partial pooling

  • Multilevel models use the grouping variables to learn about the nested structure of the data
  • The model can use this information to make educated guesses when there is incomplete information
  • The model takes advantage of partial pooling, producing shrinkage
  • This builds skepticism into the model with regard to extreme values

References

Bates, D. (2011). Mixed models in r using the lme4 package part 2: Longitudinal data, modeling interactions. http://lme4.r-forge.r-project.org/slides/2011-03-16-Amsterdam/2Longitudinal.pdf.
Brown, V. A. (2021). An introduction to linear mixed-effects modeling in R. Advances in Methods and Practices in Psychological Science, 4(1), 1–19. https://doi.org/10.1177/2515245920960351
DeBruine, L. M., & Barr, D. J. (2021). Understanding mixed-effects models through data simulation. Advances in Methods and Practices in Psychological Science, 4(1), 1–13. https://doi.org/10.1177/2515245920965119
Mahr, T. J. (2017). Plotting partial pooling in mixed-effects models. https://www.tjmahr.com/plotting-partial-pooling-in-mixed-effects-models/.
Mahr, T. J. (2019). Another mixed effects model visualization. https://www.tjmahr.com/another-mixed-effects-model-visualization/.
McElreath, R. (2015). Statistical rethinking: A bayesian course with examples in r and stan. CRC Press.