Joseph V. Casillas, PhD
Rutgers University
Last update: 2025-04-25
There isn’t one true population parameter…
but an entire distribution of parameters, some more plausible than others
Classical: There is a single “true” line of best fit, and I’ll give my best estimate of it.
Bayesian: There is a distribution of lines of fit…some more plausible than others…and I’ll give you samples from that distribution.
More cars
mpg
variableFrequentist model
We’ll fit an intercept only model:
lm(mpg ~ 1)
\[ \begin{align} mpg_{i} & \sim N(\mu_{i}, \sigma) \\ \mu_{i} & = \alpha \end{align} \]
The mean mpg
is 20.09 ±6.03 SD.
Call:
lm(formula = mpg ~ 1, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-9.6906 -4.6656 -0.8906 2.7094 13.8094
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 20.091 1.065 18.86 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.027 on 31 degrees of freedom
Bayesian model
We’ll fit an intercept only model.
brm(mpg ~ 1)
\[ \begin{align} mpg_{i} & \sim N(\mu_{i}, \sigma) \\ \mu_{i} & = \alpha \end{align} \]
The mean mpg
is 20.09 ±6.03 SD.
Family: gaussian
Links: mu = identity; sigma = identity
Formula: mpg ~ 1
Data: mtcars (Number of observations: 32)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 20.06 1.04 18.06 22.25 1.00 2881 2280
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 6.14 0.79 4.84 7.87 1.00 3106 2113
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Estimate Std. Error t value Pr(>|t|)
(Intercept) 20.09062 1.065424 18.85693 1.526151e-18
mpg
is 20.09 ±6.03 SDb_Intercept | sigma |
---|---|
21.93103 | 5.183423 |
21.46296 | 5.282083 |
19.08187 | 6.228794 |
20.53519 | 5.165951 |
19.68760 | 4.864501 |
20.81836 | 8.179158 |
19.45889 | 5.133981 |
19.96603 | 5.367908 |
20.35175 | 6.781768 |
19.92893 | 5.826664 |
20.19183 | 6.904526 |
21.06296 | 6.069348 |
19.70825 | 6.274392 |
20.85624 | 6.445802 |
21.56705 | 5.920678 |
Let’s talk about probability
\[ P(\color{green}{y}) \]
\[ P(\color{green}{y} | \color{red}{z} ) \]
\[ P(A | B) = \frac{P(B | A) \times P(A)}{P(B)} \]
The probability of A given B is equal to the probability of B given A times the probability of A divided by the probability of B
You’ll probably never actually use this, even though most texts on Bayesian inference start with this and base rate fallacy examples
The posterior distribution is the product of the likelihood and the prior…
(divided by the normalizing constant)
\[ Posterior = \frac{Likelihood \times Prior}{Normalizing\ constant} \]
\[ updated\ knowledge \propto Likelihood\ of\ data \times prior\ beliefs \]
We can think of Bayes’ theorem as a principled/systematic method for updating our knowledge in light of new data.
\[ P(unknown | observed) = \frac{\color{red}{P(observed | unknown)} \times P(unknown)}{P(observed)} \]
In essence the frequentist approach assess the probability of the data given a specific hypothesis
\[ P(data | H_{x}) \]
Through Bayes formula one is able to asses the probability of a specific hypothesis given the data
\[ P(H_{x} | data) \]
A Bayesian estimate with flat priors is generally equivalent to a frequentist MLE
Frequentism is not wrong, just different
Frequentism is susceptible to QRPs
It’s about using the right tool for the job
Do whatever you need to do answer your questions
sleepstudy
datasetlme4
packageReaction
)Days
)tibble [183 × 4] (S3: tbl_df/tbl/data.frame)
$ Reaction: num [1:183] 250 259 251 321 357 ...
$ Days : num [1:183] 0 1 2 3 4 5 6 7 8 9 ...
$ Subject : chr [1:183] "308" "308" "308" "308" ...
$ is_extra: logi [1:183] FALSE FALSE FALSE FALSE FALSE FALSE ...
Reaction | Days | Subject | is_extra |
---|---|---|---|
249.5600 | 0 | 308 | FALSE |
258.7047 | 1 | 308 | FALSE |
250.8006 | 2 | 308 | FALSE |
321.4398 | 3 | 308 | FALSE |
356.8519 | 4 | 308 | FALSE |
414.6901 | 5 | 308 | FALSE |
382.2038 | 6 | 308 | FALSE |
290.1486 | 7 | 308 | FALSE |
430.5853 | 8 | 308 | FALSE |
466.3535 | 9 | 308 | FALSE |
222.7339 | 0 | 309 | FALSE |
205.2658 | 1 | 309 | FALSE |
Linear mixed model fit by REML ['lmerMod']
Formula: Reaction ~ 1 + Days + (1 + Days | Subject)
Data: df_sleep
REML criterion at convergence: 1771.4
Scaled residuals:
Min 1Q Median 3Q Max
-3.9707 -0.4703 0.0276 0.4594 5.2009
Random effects:
Groups Name Variance Std.Dev. Corr
Subject (Intercept) 582.72 24.140
Days 35.03 5.919 0.07
Residual 649.36 25.483
Number of obs: 183, groups: Subject, 20
Fixed effects:
Estimate Std. Error t value
(Intercept) 252.543 6.433 39.257
Days 10.452 1.542 6.778
Correlation of Fixed Effects:
(Intr)
Days -0.137
\[ \begin{aligned} Reaction_{ij} & \sim normal(\mu, \sigma) \\ \mu & = \alpha_{ij} + \beta * Days_{ij} \\ \alpha & \sim normal(0, 1) \\ \beta & \sim normal(0, 1) \\ \sigma & \sim normal(0, 1) \end{aligned} \]
Family: gaussian
Links: mu = identity; sigma = identity
Formula: Reaction ~ Days + (Days | Subject)
Data: df_sleep (Number of observations: 183)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Multilevel Hyperparameters:
~Subject (Number of levels: 20)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 25.88 6.26 15.14 39.91 1.00 1793 2244
sd(Days) 6.58 1.53 4.14 10.09 1.00 1581 2319
cor(Intercept,Days) 0.09 0.30 -0.46 0.66 1.00 1047 1656
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 252.18 6.80 238.81 265.63 1.00 1909 2649
Days 10.34 1.69 6.88 13.61 1.00 1314 1918
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 25.79 1.52 23.03 28.93 1.00 3622 2961
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
“We put a tiny sliver of scientific information into the model and then we bet on the things that can happen a large number of ways”