mtcars |>
ggplot() +
aes(x = disp, y = mpg) +
geom_point() +
geom_smooth(method = 'lm', formula = "y ~ x")
Joseph V. Casillas, PhD
Rutgers University
Last update: 2025-01-04
This process should be version controlled!
This process should be version controlled!
So what is version
So what is version
This means we write code in a way that clearly documents what we did.
Instead of writing code with the purpose of telling the computer what to do, we write code that tells other humans what we told the computer to do and why.
Importantly, we don’t separate our code from the report/essay/manuscript we are writing. Everything is together, in a single document (usually).
In this class you will learn to…
manage version controlled research projects
in a way that facilitates collaboration and honesty
get and tidy data
transform and visualize your data
fit statistical models to your data and test hypotheses
communicate your results using literate programming
This is reproducible research
Programs and packages
Programs we will use
Slack
Programs we will use
R
R is the statistical programming language we will learn about in this class.
You can download R here: https://cran.r-project.org
Need help? Instructions
Programs we will use
RStudio (Posit, Positron)
We will interface with R using RStudio (Posit), a fully feautred IDE.
RStudio (Posit) is available to download here:
https://posit.co/download/rstudio-desktop/#download
Need help? Instructions
Programs we will use
R packages we will use
Obligatory
tidyverse: Install and load tidy verse packages
ds4ling: Functions and datasets used in this course
knitr: Dynamic report generation
rmarkdown: Dynamic documents
papaja: Reproducible APA manuscripts in RMarkdown
xaringan: HTML presentations in RMarkdown
here: Reproducible way to set working directory
devtools: Install packages from GitHub
You can download a package in r using the following command:
Programs we will use
R packages we will use
Helpful
lme4: Multilevel models
brms: Bayesian data analysis
patchwork: Combine ggplots
broom: Stat models to tidy dataframe
learnr: Interactive tutorials
stringr: For manipulating strings
sjPlot: For making plots and tables from model objects
Programs we will use
Github
Github is a Web-based Git version control repository hosting service.
It is mostly used for computer code (like Dropbox for nerds).
We will use GitHub for project management and sharing reproducible reports.
Need help? Instructions
Programs we will use
Github Desktop
This can make interacting with Git much easier
You can download the app here: https://desktop.github.com
Data Science for Linguists