---
<img src="index_files/figure-html/lin-log-comp-1.png" width="100%" />
---
# Logistic regression
### What you need to know
.large[
- Logistic regression is the most appropriate way to model binary response
variables (0/1)
- The model calculates the probability that y = 1, i.e., the probability of a
"success", or presence of something
- Model output from logistic regression is similar to `lm()`
- Model interpretation "works" the same way, i.e., a 1-unit change in
`predictor` is associated with a change of X in the criterion
- But... the parameter estimates represent changes in the log-odds of y = 1
- This is much less intuitive, much more difficult to understand without some
math
]
---
# Logistic regression
### Example
.large[
- You are interested in understanding the perception of stop voicing in
English bilabials
- You conducted an experiment in which participants heard a range of
bilabial stops that differed in voice-onset time
- The stimuli ranged from -60 ms to 60 ms in 10 ms increments
- Participants were presented stimuli drawn at random from the continuum and
identified the sounds as /b/'s or /p/'s
- A /p/ response is coded as a 1
]
---
background-image: url(./assets/img/vot.png)
background-size: contain
---
# Logistic regression
.left-column[
<br><br><br><br>
```r
mod_log <- glm(
resp ~ vot,
data = vot_logistic_data,
family = "binomial"
)
```
]
.right-column[
```
Call:
glm(formula = resp ~ vot, family = "binomial", data = vot_logistic_data)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.3085 -0.6583 -0.2198 0.6503 2.9320
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.84614 0.06089 -13.9 <2e-16 ***
vot 0.05731 0.00213 26.9 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 3482.8 on 2599 degrees of freedom
Residual deviance: 2063.6 on 2598 degrees of freedom
AIC: 2067.6
Number of Fisher Scoring iterations: 5
```
]
---
# Logistic regression
### Example
.Large[
- We can convert the log-odds to probabilities by calculating the inverse
logit<sup>1</sup>
]
```r
inv_logit(mod_log) %>% kable(., format = 'html')
```
<table>
<thead>
<tr>
<th style="text-align:left;"> variables </th>
<th style="text-align:right;"> betas </th>
<th style="text-align:right;"> prob </th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left;"> (Intercept) </td>
<td style="text-align:right;"> -0.8461443 </td>
<td style="text-align:right;"> 0.3002423 </td>
</tr>
<tr>
<td style="text-align:left;"> vot </td>
<td style="text-align:right;"> 0.0573077 </td>
<td style="text-align:right;"> 0.5143230 </td>
</tr>
</tbody>
</table>
--
.Large[
- This is still difficult to interpret... a plot might help.
]
</br></br>
<sup>1</sup> Inverse logit = `\(\frac{1}{1 + exp(-x)}\)`
---
class: middle
<img src="index_files/figure-html/vot-plot-1.png" width="100%" />
---
class: middle
.pull-left[
```r
inv_logit(mod_log) %>% kable(., format = 'html')
```
<table>
<thead>
<tr>
<th style="text-align:left;"> variables </th>
<th style="text-align:right;"> betas </th>
<th style="text-align:right;"> prob </th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left;"> (Intercept) </td>
<td style="text-align:right;"> -0.8461443 </td>
<td style="text-align:right;"> 0.3002423 </td>
</tr>
<tr>
<td style="text-align:left;"> vot </td>
<td style="text-align:right;"> 0.0573077 </td>
<td style="text-align:right;"> 0.5143230 </td>
</tr>
</tbody>
</table>
</br>
- Now the intercept is interpretable
(note it is already centered)
- What does the parameter estimate
for VOT mean?
- Can calculate how the probability
differs from one specific point to
another?
]
.pull-right[
<img src="index_files/figure-html/repeat-glm-vot-plot-1.png" width="100%" />
]
---
class: middle
.pull-left[
#### We can use the model coefficients
<table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
<thead>
<tr>
<th style="text-align:left;"> term </th>
<th style="text-align:right;"> estimate </th>
<th style="text-align:right;"> std.error </th>
<th style="text-align:right;"> statistic </th>
<th style="text-align:right;"> p.value </th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left;"> (Intercept) </td>
<td style="text-align:right;"> -0.846 </td>
<td style="text-align:right;"> 0.061 </td>
<td style="text-align:right;"> -13.897 </td>
<td style="text-align:right;"> 0 </td>
</tr>
<tr>
<td style="text-align:left;"> vot </td>
<td style="text-align:right;"> 0.057 </td>
<td style="text-align:right;"> 0.002 </td>
<td style="text-align:right;"> 26.899 </td>
<td style="text-align:right;"> 0 </td>
</tr>
</tbody>
</table>
- Calculate the inverse logit of the
linear equation:
`$$\alpha + \beta_{VOT} * 10ms$$`
```r
plogis(-0.846 + 0.057 * 10)
```
```
## [1] 0.4314347
```
- What about the change in probability of selecting /p/ when shifting from 10 ms
to 20 ms?
```r
plogis(-0.846 + 0.057 * 20) -
plogis(-0.846 + 0.057 * 10)
```
```
## [1] 0.1415404
```
]
.pull-right[
<img src="index_files/figure-html/repeat-glm-vot-plot2-1.png" width="100%" />
The shift from 10ms to 20 ms VOT corresponds
with a positive difference of 14% in
the probability of selecting /p/
]
---
# Logistic regression
### Summary
.large[
- Logistic regression is a powerful tool for modeling binary data
- The `glm()` function works similarly to the `lm()` function
- We test for main effects and interactions the same way too, i.e., using
nested model comparisons with the `anova()` function
- The exponential family and corresponding linking function are
`family = binomial(link = "logit")`
- Interpretation of logistic regression works the same way as classic linear
regression
- Parameter estimates are evaluated in log odds (and require some work in order
to accurately interpret them)
]
---
layout: false
class: middle