@chelseaparlettpelleriti They're LITERALLY the same #statsTikTok #fyp #math #statistics #regression
♬ TWINS - Kaygon
Joseph V. Casillas, PhD
Rutgers University
Last update: 2025-04-01
A quick review
Classical MRC
A quick review
Classical ANOVA
Classical Analysis of Variance (ANOVA) assumes that all predictors are discontinuous variables
ANOVA methods are often abused by forcing continuous variables into discontinuous form (this can reduce your statistical power by as much as 50%)
A quick review
Both types of variables (continuous and categorical) exist in the world
Categorical
Continuous
A quick review
The modern GLM
Modern GLM includes both MRC and ANOVA:
A quick review
The modern GLM
Aristotle
Plato
A brief history
A brief history
How unification occurred
A brief history
A brief history
A brief history
A brief history
Two Disciplines of Scientific Psychology (Cronbach, 1957)
Differential Psychology
(Galton and Pearson)
Experimental Psychology
(Fechner, Weber, and Wundt)
A brief history
The two disciplines
MRC and ANOVA are both part of the same GLM model but, over time, they began to diverge from each other:
Fisher (experimentalists) vs. Pearson (observationalists):
A brief history
Sir Ronald Aylmer Fisher
“There is, then, in this analysis of variance no indication of any other than innate and heritable factors at work.”
(coining the phrase ‘analysis of variance’)
– R.A. Fisher (1919)
“Critical tests of this kind may be called tests of significance, and when such tests are available we may discover whether a second sample is or is not significantly different from the first.”
(coining the phrase ‘test of significance’)
– R.A. Fisher (1925), p.43
A brief history
Sir Ronald Aylmer Fisher
Invented ANOVA to support
A brief history
Fisher’s method
A brief history
Fisher’s method
A brief history
Fisher’s method
A brief history
Analysis of Variance (ANOVA)
How did Sir Ronald Fisher build the ANOVA model?
He built it from the MRC model…
A brief history
Summed Linear Deviations (MRC)
Sum of Linear Deviations:
\[(y_{i} - \bar{y}) = (\hat{y}_{i} - \bar{y}) + (y_{i} - \hat{y}_{i})\]
Total Deviation = Predicted Deviation + Error Deviation
A brief history
Multiple Regression/Correlation
In MRC, the predicted y ( \(\hat{y}_{i}\) ) is the score predicted based on the regression line:
MRC is based on having individual scores as the criterion variable (y)
Individual continuous variables are also the predictors for y
A brief history
Summed Linear Deviations (ANOVA)
Sum of Linear Deviations:
\[(y_{i} - \bar{y}_{G}) = (\bar{y}_{j} - \bar{y}_{G}) + (y_{i} - \bar{y}_{j})\]
Total Deviation = Predicted Deviation + Error Deviation
A brief history
Analysis of Variance
In ANOVA, you are dealing with groups:
A brief history
Analysis of Variance
What is the best prediction you can make about any individual in a group if you don’t know anything else about that individual?
This is because my prediction for you (if I don’t know anything else about you) is based on your group’s mean
A brief history
Sums of Squared Deviations (MRC)
SS = Sum of squares:
\[\sum (y_{i} - \bar{y})^2 = \sum (\hat{y}_{i} - \bar{y})^2 + \sum (y_{i} - \hat{y}_{i})^2\]
\[SS_{Total} = SS_{Predicted} + SS_{Error}\]
\(SS_{Total}\) | = | \(\sum (y_{i} - \bar{y})^2\) |
\(SS_{Predicted}\) | = | \(\sum (\hat{y}_{i} - \bar{y})^2\) |
\(SS_{Error}\) | = | \(\sum (y_{i} - \hat{y}_{i})^2\) |
A brief history
Sums of Squared Deviations (ANOVA)
SS = Sum of squares:
\[\sum (y_{i} - \bar{y}_{G})^2 = \sum (\bar{y}_{j} - \bar{y}_{G})^2 + \sum (y_{i} - \bar{y}_{j})^2\]
\[SS_{Total} = SS_{Predicted} + SS_{Error}\]
\(SS_{Total}\) | = | \(\sum (y_{i} - \bar{y}_{G})^2\) |
\(SS_{Predicted}\) | = | \(\sum (\bar{y}_{j} - \bar{y}_{G})^2\) |
\(SS_{Error}\) | = | \(\sum (y_{i} - \bar{y}_{j})^2\) |
A brief history
Squared Multiple Correlation Coefficient (MRC)
\[R^2 = \frac{\sum (\hat{y}_{i} - \bar{y})^2} {\sum (y_{i} - \bar{y})^2}\]
\[R^2 = \frac{SS_{Predicted}} {SS_{Total}}\]
Coefficient of determination
Proportion of Variance Explained
A brief history
Squared Multiple Correlation Coefficient (ANOVA)
\[R^2 = \frac{\sum (\hat{y}_{j} - \bar{y}_{G})^2} {\sum (y_{i} - \bar{y}_{G})^2}\]
\[R^2 = \frac{SS_{Predicted}} {SS_{Total}}\]
Coefficient of determination
Proportion of Variance Explained
A brief history
Mean Squared Deviations (MRC)
MS = Mean Squares (Variances):
\(MS_{Total}\) | = | \(\sum (y_{i} - \bar{y})^2 / (n - 1)\) |
\(MS_{Predicted}\) | = | \(\sum (\hat{y}_{i} - \bar{y})^2 / (k)\) |
\(MS_{Error}\) | = | \(\sum (y_{i} - \hat{y}_{i})^2 / (n - k - 1)\) |
\[F_{(k), (n-k-1)} = \frac{\sum (\hat{y}_{i} - \bar{y})^2 / (k)} {\sum (y_{i} - \hat{y}_{i})^2 / (n - k - 1)}\]
A brief history
Mean Squared Deviations (ANOVA)
MS = Mean Squares (Variances):
\(MS_{Total}\) | = | \(\sum (y_{i} - \bar{y}_{G})^2 / (n - 1)\) |
\(MS_{Predicted}\) | = | \(\sum (\bar{y}_{j} - \bar{y}_{G})^2 / (g - 1)\) |
\(MS_{Error}\) | = | \(\sum (y_{i} - \bar{y}_{j})^2 / (n - g)\) |
\[F_{(g-1), (n-g)} = \frac{\sum (\bar{y}_{j} - \bar{y}_{G})^2 / (g - 1)} {\sum (y_{i} - \hat{y}_{j})^2 / (n - g)}\]
A brief history
Mean Squared Deviations (MRC/ANOVA)
MS = Mean Squares (Variances):
\(MS_{Total}\) | = | \(SS_{Total} / df_{Total}\) |
\(MS_{Predicted}\) | = | \(SS_{Predicted} / df_{Predicted}\) |
\(MS_{Error}\) | = | \(SS_{Error} / df_{Error}\) |
\[F-ratio = \frac{MS_{Predicted}} {MS_{Error}}\]
A brief history
Degrees of Freedom
MRC
\(df_{Total}\) | = | n - 1 |
\(df_{Predicted}\) | = | k |
\(df_{Error}\) | = | n - k - 1 |
\[df_{Total} = df_{Predicted} + df_{Error}\]
ANOVA
\(df_{Total}\) | = | n - 1 |
\(df_{Predicted}\) | = | g - 1 |
\(df_{Error}\) | = | n - g |
\[df_{Total} = df_{Predicted} + df_{Error}\]
A brief history
Equivalences
MRC | ANOVA | |
---|---|---|
. | ||
\(SS_{Predicted}\) | = | \(SS_{BG}\) |
\(SS_{Error}\) | = | \(SS_{WG}\) |
. | ||
\(MS_{Predicted}\) | = | \(MS_{BG}\) |
\(MS_{Error}\) | = | \(MS_{WG}\) |
. | ||
k | = | g - 1 |
n - k - 1 | = | n - g |
A brief history
Equivalences (MRC/ANOVA)
A brief history
The Logic of the F-Ratio
So how did unification occur?
The General Linear Model
Dummy variables
@chelseaparlettpelleriti They're LITERALLY the same #statsTikTok #fyp #math #statistics #regression
♬ TWINS - Kaygon
\[\hat{y}_{i} = a + b_{1}x_{1} + b_{2}x_{2} + b_{3}x_{3} ...\]
What if the predictors are dummy variables?
\(x_{1}\) | = | \(d_{1}\) | = | 0 or 1 |
\(x_{2}\) | = | \(d_{2}\) | = | 0 or 1 |
\(x_{3}\) | = | \(d_{3}\) | = | 0 or 1 |
A single dummy variable:
\[\hat{y}_{i} = a + b_{1}d_{1}\]
Evaluating the function:
\(\hat{y}_{(d1=1)} =\) | \(a + b_{1}\) |
\(\hat{y}_{(d1=0)} =\) | \(a\) |
\(\bar{y}_{(d1=1)} = \hat{y}_{(d1=1)} = a + b_{1}\) |
\(\bar{y}_{(d1=0)} = \hat{y}_{(d1=0)} = a\) |
\[b_{1} = (\bar{y}_{(d1=1)} - \bar{y}_{(d1=0)})\]
Dummy variables
You get j-1 dummies
Groups | d1 | d2 | d3 |
---|---|---|---|
A | 0 | 0 | 0 |
B | 1 | 0 | 0 |
C | 0 | 1 | 0 |
D | 0 | 0 | 1 |
Dummy variables
One level is taken as the reference or baseline. This is the intercept.
Groups | d1 | d2 | d3 | |
---|---|---|---|---|
A | 0 | 0 | 0 | ⬅︎ Intercept |
B | 1 | 0 | 0 | |
C | 0 | 1 | 0 | |
D | 0 | 0 | 1 |
Dummy variables
j-1 dummies = j-1 comparisons
Groups | d1 | d2 | d3 | |
---|---|---|---|---|
A | 0 | 0 | 0 | ⬅︎ Intercept |
B | 1 | 0 | 0 | |
C | 0 | 1 | 0 | |
D | 0 | 0 | 1 | |
⬆︎ AB |
Dummy variables
j-1 dummies = j-1 comparisons
Groups | d1 | d2 | d3 | |
---|---|---|---|---|
A | 0 | 0 | 0 | ⬅︎ Intercept |
B | 1 | 0 | 0 | |
C | 0 | 1 | 0 | |
D | 0 | 0 | 1 | |
⬆︎ AC |
Dummy variables
j-1 dummies = j-1 comparisons
Groups | d1 | d2 | d3 | |
---|---|---|---|---|
A | 0 | 0 | 0 | ⬅︎ Intercept |
B | 1 | 0 | 0 | |
C | 0 | 1 | 0 | |
D | 0 | 0 | 1 | |
⬆︎ AD |
Let’s see some examples…
lm(mpg ~ cyl, data = mtcars)
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
Intercept | 26.664 | 0.972 | 27.437 | 0 |
6-cyl | -6.921 | 1.558 | -4.441 | 0 |
8-cyl | -11.564 | 1.299 | -8.905 | 0 |
lm(mpg ~ cyl, data = mtcars)
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
Intercept | 26.664 | 0.972 | 27.437 | 0 |
6-cyl | -6.921 | 1.558 | -4.441 | 0 |
8-cyl | -11.564 | 1.299 | -8.905 | 0 |
lm(mpg ~ cyl, data = mtcars)
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
Intercept | 26.664 | 0.972 | 27.437 | 0 |
6-cyl | -6.921 | 1.558 | -4.441 | 0 |
8-cyl | -11.564 | 1.299 | -8.905 | 0 |
mtcars |>
lm(mpg ~ cyl, data = _) |>
summary()
Term | Estimate | Std. Error | t | p.value |
---|---|---|---|---|
Intercept | 26.66 | 0.97 | 27.44 | 0e+00 |
6 | -6.92 | 1.56 | -4.44 | 1e-04 |
8 | -11.56 | 1.30 | -8.90 | 0e+00 |
mtcars |>
group_by(cyl) |>
summarize(avg = mean(mpg), sd = sd(mpg))
cyl | avg | sd |
---|---|---|
4 | 26.66 | 4.51 |
6 | 19.74 | 1.45 |
8 | 15.1 | 2.56 |
6-cyl
is not compared to 8-cyl
Term | Estimate | Std. Error | Statistic | p.value |
---|---|---|---|---|
Intercept | 19.74 | 1.22 | 16.21 | 0.0000 |
cyl4 | 6.92 | 1.56 | 4.44 | 0.0001 |
cyl8 | -4.64 | 1.49 | -3.11 | 0.0042 |
Dummy variables
Categorical and continuous predictors
\[vocab \sim age\]
Call:
lm(formula = vocab ~ ages, data = vocab_sample)
Residuals:
Min 1Q Median 3Q Max
-6047.8 -1665.3 19.7 1865.3 6449.0
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1196.61 546.82 -2.188 0.0298 *
ages 1397.87 53.84 25.962 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2444 on 198 degrees of freedom
Multiple R-squared: 0.7729, Adjusted R-squared: 0.7718
F-statistic: 674.1 on 1 and 198 DF, p-value: < 2.2e-16
\[vocab \sim age + reader\]
Call:
lm(formula = vocab ~ ages + reader_type, data = vocab_sample)
Residuals:
Min 1Q Median 3Q Max
-4166.1 -1236.6 62.4 1190.2 4909.5
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2254.98 394.00 -5.723 3.84e-08 ***
ages 1338.25 38.32 34.925 < 2e-16 ***
reader_typefrequent 3474.07 246.42 14.098 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1729 on 197 degrees of freedom
Multiple R-squared: 0.887, Adjusted R-squared: 0.8858
F-statistic: 773 on 2 and 197 DF, p-value: < 2.2e-16
Dummy variables
Categorical and continuous predictors - mixed GLMs
Dummy variables
Categorical and continuous predictors - mixed GLMs
Dummy variables
Categorical and continuous predictors - mixed GLMs
\[vocab \sim age + reader + age:reader\]
Call:
lm(formula = vocab ~ ages * reader_type, data = vocab_sample)
Residuals:
Min 1Q Median 3Q Max
-3673.5 -1036.0 22.7 1027.0 3804.4
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -138.67 475.28 -0.292 0.771
ages 1110.73 48.42 22.939 < 2e-16 ***
reader_typefrequent -1027.18 705.63 -1.456 0.147
ages:reader_typefrequent 465.73 69.28 6.723 1.9e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1562 on 196 degrees of freedom
Multiple R-squared: 0.9082, Adjusted R-squared: 0.9067
F-statistic: 646 on 3 and 196 DF, p-value: < 2.2e-16
Characteristic |
vocab ~ age
|
vocab ~ age + reader_type
|
vocab ~ ages * reader_type
|
||||||
---|---|---|---|---|---|---|---|---|---|
Beta | 95% CI | p-value | Beta | 95% CI | p-value | Beta | 95% CI | p-value | |
(Intercept) | -1,197 | -2,275, -118 | 0.030 | -2,255 | -3,032, -1,478 | <0.001 | -139 | -1,076, 799 | 0.8 |
ages | 1,398 | 1,292, 1,504 | <0.001 | 1,338 | 1,263, 1,414 | <0.001 | 1,111 | 1,015, 1,206 | <0.001 |
reader_type | |||||||||
average | — | — | — | — | |||||
frequent | 3,474 | 2,988, 3,960 | <0.001 | -1,027 | -2,419, 364 | 0.15 | |||
ages * reader_type | |||||||||
ages * frequent | 466 | 329, 602 | <0.001 | ||||||
Abbreviation: CI = Confidence Interval |
Analysis of Variance Table
Model 1: vocab ~ 1
Model 2: vocab ~ ages
Model 3: vocab ~ ages + reader_type
Model 4: vocab ~ ages * reader_type
Res.Df RSS Df Sum of Sq F Pr(>F)
1 199 5208897978
2 198 1182686093 1 4026211885 1649.497 < 2.2e-16 ***
3 197 588724259 1 593961835 243.340 < 2.2e-16 ***
4 196 478410910 1 110313349 45.194 1.901e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The vocabulary data were analyzed using a general linear model. Estimated vocabulary size was the criterion with age and reader type (frequent/average) as predictors. The reader type factor was dummy coded with average readers set as the reference group. Main effects and the age by reader type interaction were assessed using nested model comparisons. Experiment-wise alpha was set at 0.05.
There was a main effect of age (F(1) = 1649.49, p < 0.001), reader type (F(1) = 243.34; p < 0.001), as well as an age by reader type interaction (F(1) = 45.19; p < 0.001). The model containing the interaction provided the best fit of the data (R2 = 0.91). Overall, vocabulary size increased as a function of age. However, the size of the effect was modulated by reader type. Specifically, average readers showed an increase of approximately 1,111 words +/- 48.42 se (t = 22.94, p < 0.001) per year. Frequent readers showed an additional increase of 466 words +/- 69.28 se per year (1,577 words total, t = 6.72, p < 0.001).