Data Science for Linguists

Residuals

Joseph V. Casillas, PhD

Rutgers UniversitySpring 2025
Last update: 2025-05-04

What is Data Science again?

You have learned how to version control this process!

You have learned how to version control this process!




So what is version control (again)?




So what is version control (again)?

Don’t forget the stats…







mtcars |>
  ggplot() + 
  aes(x = disp, y = mpg) + 
  geom_point() + 
  geom_smooth(method = 'lm', formula = "y ~ x")

We do literate programming

  • This means we write code in a way that clearly documents what we did.

  • Instead of writing code with the purpose of telling the computer what to do, we write code that tells other humans what we told the computer to do and why.

  • Importantly, we don’t separate our code from the report/essay/manuscript we are writing. Everything is together, in a single document (usually).

In this class you have learned to…

manage version controlled research projects

in a way that facilitates collaboration and honesty

get and tidy data

transform and visualize your data

fit statistical models to your data and test hypotheses

communicate your results using literate programming

This is reproducible research

What we’ve seen

  • MRC
  • Linear regression
  • General linear model
  • Generalized linear model

What we’ve seen

  • MRC
  • Linear regression
  • General linear model
  • Generalized linear model

What we’ve seen

  • MRC
  • Linear regression
  • General linear model
  • Generalized linear model

What we’ve seen

Frequentist

  • Bayesian MRC
  • Bayesian Linear regression
  • Bayesian General linear model
  • Bayesian Generalized linear model

Next steps

Moving forward

  • Unfortunately we don’t have enough time to go into more detail
  • Your journey with programming and statistics is just getting started
  • Use your knowledge to think critically about what you read and about your own data
  • New techniques and methods are constantly coming out, but it seems unlikely we will stray too far from the linear model
  • Book recommendations

Moving forward

  • Getting help
    • stackoverflow.com
    • coursera.con
    • datacamp.com

  • Coding (things to learn)
    • functional programming
    • leaving the tidyverse…

Data Science for Linguists


www.ds4ling.jvcasillas.com


joseph.casillas@rutgers.edu
@jvcasill
@jvcasillas

References

Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. Cambridge University Press.
Gelman, A., Hill, J., & Vehtari, A. (2021). Regression and other stories. Cambridge University Press.
McElreath, R. (2015). Statistical rethinking: A bayesian course with examples in r and stan. CRC Press.