@chelseaparlettpelleriti The model with 1000 names #statsTikTok đ
⏠Say So (feat. Nicki Minaj) - Doja Cat / Nicki Minaj
Joseph V. Casillas, PhD
Rutgers University
Last update: 2025-04-18
@chelseaparlettpelleriti The model with 1000 names #statsTikTok đ
⏠Say So (feat. Nicki Minaj) - Doja Cat / Nicki Minaj
Everything we have done this semester has been under the assumption that we have one data point per participant (i.e., no within subjects factors)
This is because one of the assumptions of our models was that there was no autocorrelation, i.e., that the data were independent
Repeated measures designs introduce autocorrelation into the model. Why?
Disregarding lack of independence = pseudo-replication
In other words, replicating the data as though they were independent when they arenât
This will inflate your degrees of freedom, completely bias your parameter estimates and make your p-values meaningless
chick
(chick 1)Time
can be thought of as a continuous time series variable (within-subjects) that would violate our assumptions of independenceDiet
, on the other hand, is a between-subjects factor (you canât be on more than one diet at a time)Time
and Diet
appear in this dataset as numeric values.weight | Time | Chick | Diet |
---|---|---|---|
42 | 0 | 1 | 1 |
51 | 2 | 1 | 1 |
59 | 4 | 1 | 1 |
64 | 6 | 1 | 1 |
76 | 8 | 1 | 1 |
93 | 10 | 1 | 1 |
106 | 12 | 1 | 1 |
125 | 14 | 1 | 1 |
149 | 16 | 1 | 1 |
171 | 18 | 1 | 1 |
199 | 20 | 1 | 1 |
205 | 21 | 1 | 1 |
40 | 0 | 2 | 1 |
49 | 2 | 2 | 1 |
58 | 4 | 2 | 1 |
72 | 6 | 2 | 1 |
84 | 8 | 2 | 1 |
103 | 10 | 2 | 1 |
122 | 12 | 2 | 1 |
138 | 14 | 2 | 1 |
\[ \begin{align} y_{i} & \sim Normal(\mu_{i}, \sigma) \\ \mu_{i} & = \alpha + \beta_{1} x_{i} \\ \sigma & \sim Normal(0, \sigma^{2}) \end{align} \]
lm()
or glm()
:lm(criterion ~ predictor, data = my_data)
\[ \hat{y} = \alpha + \color{red}{\beta}\color{blue}{X} + \color{green}{u}\color{purple}{Z} + \epsilon \]
\[ response = intercept + \color{red}{slope} \times \color{blue}{FE} + \color{green}{u} \times \color{purple}{RE} + error \]
lmer()
or glmer()
from the lme4
package (there are other options as well)lmer(criterion ~ fixed_effect + (1|random_effect), data = my_data)
fixed | random |
---|---|
repeatable | non repeatable |
systematic influence | random influence |
exhaust the pop. | sample the pop. |
generally of interest | often not of interest |
continuous or categorical | have to be categorical |
subjects | time | response |
---|---|---|
p_01 | 1 | 20.00 |
p_01 | 2 | 21.18 |
p_01 | 3 | 18.93 |
p_01 | 4 | 8.55 |
p_01 | 5 | 26.20 |
p_01 | 6 | 24.15 |
p_01 | 7 | 25.55 |
p_01 | 8 | 19.66 |
p_01 | 9 | 27.61 |
p_01 | 10 | 34.99 |
p_01 | 11 | 42.63 |
p_01 | 12 | 35.87 |
p_01 | 13 | 40.81 |
p_01 | 14 | 40.16 |
p_01 | 15 | 34.22 |
p_01 | 16 | 41.99 |
p_01 | 17 | 44.12 |
p_01 | 18 | 48.13 |
p_01 | 19 | 57.08 |
p_01 | 20 | 57.06 |
p_02 | 1 | 23.22 |
p_02 | 2 | 24.68 |
p_02 | 3 | 32.39 |
p_02 | 4 | 33.54 |
response
variable at 20 different time points for each participant (n = 10)lm(response ~ time, data = my_df)
\[ \begin{align} response_{i} & \sim N(\mu_{i}, \sigma) \\ \mu_{i} & = \alpha + \beta_1 time_{i} \\ \sigma & \sim Normal(0, \sigma^{2}) \end{align} \]
subjects
as a random effectlmer(response ~ time + (1|subjects), data = my_df)
\[ \begin{align} response_{it} & \sim N(\mu_{it}, \sigma) \\ \mu_{it} & = \alpha + \alpha_{subject_i} + \beta_1 time_{t} \\ \sigma & \sim Normal(0, \sigma^{2}) \end{align} \]
time
= 0)
time
time
we take into account the fact that response
change for each individual at a different ratelmer(response ~ time + (1 + time|subjects), data = my_df)
(1 + time|subjects)
represents the random structure of the model|
is a random intercept|
is given a random slope for the effect specified to the right(1 + time|subjects)
means random slopes for the effect time
for each subject
glmer()
We can think of multilevel models as
Generally:
For this reason, some reasercheres (myself included) prefer to refer to grouping-level effects and population-level effects, as opposed to random effects and fixed effects
Under this view, random intercepts and random slopes are conceptualized and varying intercepts and varying slopes
sleepstudy
datasetlme4
packageReaction
)Days
)tibble [183 Ă 3] (S3: tbl_df/tbl/data.frame)
$ Reaction: num [1:183] 250 259 251 321 357 ...
$ Days : num [1:183] 0 1 2 3 4 5 6 7 8 9 ...
$ Subject : chr [1:183] "308" "308" "308" "308" ...
Reaction | Days | Subject |
---|---|---|
249.5600 | 0 | 308 |
258.7047 | 1 | 308 |
250.8006 | 2 | 308 |
321.4398 | 3 | 308 |
356.8519 | 4 | 308 |
414.6901 | 5 | 308 |
382.2038 | 6 | 308 |
290.1486 | 7 | 308 |
430.5853 | 8 | 308 |
466.3535 | 9 | 308 |
222.7339 | 0 | 309 |
205.2658 | 1 | 309 |
Model | Subject | Intercept | Slope_Days |
---|---|---|---|
No pooling | 308 | 244.1927 | 21.764702 |
No pooling | 309 | 205.0549 | 2.261785 |
No pooling | 310 | 203.4842 | 6.114899 |
No pooling | 330 | 289.6851 | 3.008073 |
No pooling | 331 | 285.7390 | 5.266019 |
No pooling | 332 | 264.2516 | 9.566768 |
No pooling | 333 | 275.0191 | 9.142046 |
No pooling | 334 | 240.1629 | 12.253141 |
No pooling | 335 | 263.0347 | -2.881034 |
No pooling | 337 | 290.1041 | 19.025974 |
No pooling | 349 | 215.1118 | 13.493933 |
No pooling | 350 | 225.8346 | 19.504017 |
No pooling | 351 | 261.1470 | 6.433498 |
No pooling | 352 | 276.3721 | 13.566549 |
No pooling | 369 | 254.9681 | 11.348109 |
No pooling | 370 | 210.4491 | 18.056151 |
No pooling | 371 | 253.6360 | 9.188445 |
No pooling | 372 | 267.0448 | 11.298073 |
No pooling | 374 | 286.0000 | 2.000000 |
Call:
lm(formula = Reaction ~ Days, data = df_sleep)
Residuals:
Min 1Q Median 3Q Max
-110.646 -27.951 1.829 26.388 139.875
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 252.321 6.406 39.389 < 2e-16 ***
Days 10.328 1.210 8.537 5.48e-15 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 47.43 on 181 degrees of freedom
Multiple R-squared: 0.2871, Adjusted R-squared: 0.2831
F-statistic: 72.88 on 1 and 181 DF, p-value: 5.484e-15
Model | Subject | Intercept | Slope_Days |
---|---|---|---|
Complete pooling | 308 | 252.3207 | 10.32766 |
Complete pooling | 309 | 252.3207 | 10.32766 |
Complete pooling | 310 | 252.3207 | 10.32766 |
Complete pooling | 330 | 252.3207 | 10.32766 |
Complete pooling | 331 | 252.3207 | 10.32766 |
Complete pooling | 332 | 252.3207 | 10.32766 |
Complete pooling | 333 | 252.3207 | 10.32766 |
Complete pooling | 334 | 252.3207 | 10.32766 |
Complete pooling | 335 | 252.3207 | 10.32766 |
Complete pooling | 337 | 252.3207 | 10.32766 |
Complete pooling | 349 | 252.3207 | 10.32766 |
Complete pooling | 350 | 252.3207 | 10.32766 |
Complete pooling | 351 | 252.3207 | 10.32766 |
Complete pooling | 352 | 252.3207 | 10.32766 |
Complete pooling | 369 | 252.3207 | 10.32766 |
Complete pooling | 370 | 252.3207 | 10.32766 |
Complete pooling | 371 | 252.3207 | 10.32766 |
Complete pooling | 372 | 252.3207 | 10.32766 |
Complete pooling | 374 | 252.3207 | 10.32766 |
Complete pooling | 373 | 252.3207 | 10.32766 |
Reaction ~ 1 + Days + (1 + Days | Subject)
Days
to vary for each Subject
Linear mixed model fit by REML ['lmerMod']
Formula: Reaction ~ 1 + Days + (1 + Days | Subject)
Data: df_sleep
REML criterion at convergence: 1771.4
Scaled residuals:
Min 1Q Median 3Q Max
-3.9707 -0.4703 0.0276 0.4594 5.2009
Random effects:
Groups Name Variance Std.Dev. Corr
Subject (Intercept) 582.72 24.140
Days 35.03 5.919 0.07
Residual 649.36 25.483
Number of obs: 183, groups: Subject, 20
Fixed effects:
Estimate Std. Error t value
(Intercept) 252.543 6.433 39.257
Days 10.452 1.542 6.778
Correlation of Fixed Effects:
(Intr)
Days -0.137
Subject | Intercept | Slope_Days | Model |
---|---|---|---|
308 | 253.9478 | 19.6264337 | Partial pooling |
309 | 211.7331 | 1.7319161 | Partial pooling |
310 | 213.1582 | 4.9061511 | Partial pooling |
330 | 275.1425 | 5.6436007 | Partial pooling |
331 | 273.7286 | 7.3862730 | Partial pooling |
332 | 260.6504 | 10.1632571 | Partial pooling |
333 | 268.3683 | 10.2246059 | Partial pooling |
334 | 244.5524 | 11.4837802 | Partial pooling |
335 | 251.3702 | -0.3355788 | Partial pooling |
337 | 286.2319 | 19.1090424 | Partial pooling |
349 | 226.7663 | 11.5531844 | Partial pooling |
350 | 238.7807 | 17.0156827 | Partial pooling |
351 | 256.2344 | 7.4119456 | Partial pooling |
352 | 272.3511 | 13.9920878 | Partial pooling |
369 | 254.9484 | 11.2985770 | Partial pooling |
370 | 226.3701 | 15.2027877 | Partial pooling |
371 | 252.5051 | 9.4335409 | Partial pooling |
372 | 263.8916 | 11.7253429 | Partial pooling |
373 | 248.9753 | 10.3915288 | Partial pooling |
374 | 271.1450 | 11.0782516 | Partial pooling |