• Course logo for Data Science for Psychologists
  • Front Matter
  • Welcome to PSY 703
    • Mason Notes
      • How to use these notes
      • Status of course
  • Attribution
    • Major Attributions
    • Additional Attributions
  • License
  • Sitemap
  • Colophon
  • I Module 00
  • Don’t Miss Module 00
    • 0.1 Big Ideas
    • 0.2 Course Modality
      • 0.2.1 Successful Asynchronous Learning
    • 0.3 Knowledge is Power
    • 0.4 Meet Prof. Mason
    • 0.5 Website Tour
  • Guidance
    • 0.6 Materials
      • 0.6.1 Hardware
      • 0.6.2 Required Texts
      • 0.6.3 Software
    • 0.7 Assignment Instructions
      • 0.7.1 Portfolio
  • II Module 01
  • 1 Welcome to Data Science
    • 1.1 Module Materials
      • 1.1.1 Estimated Video Length
  • 2 What is Data Science?
    • 2.1 See for yourselves
      • 2.1.1 Shiny App
      • 2.1.2 Hans Rosling
      • 2.1.3 Social Media
      • 2.1.4 Read for yourselves
    • 2.2 Course structure and some other useful things
  • 3 Activity: UN voting
    • 3.1 UN Voting
    • 3.2 COVID Data
  • 4 Lecture: Meet our toolbox
    • 4.1 Reproducible data analysis
      • 4.1.1 Reproducibility checklist
    • 4.2 Toolkit for Reproducible Data Analysis
    • 4.3 R and RStudio
      • 4.3.1 Install R and RStudio
      • 4.3.2 Testing testing
      • 4.3.3 Add-on packages
      • 4.3.4 Further resources
  • 5 Activity: Bechdel
  • 6 Activity: Oh My Git! Version Control Challenge
  • 7 Lecture: Thoughtful Workflow
    • 7.1 R Markdown
    • 7.2 Git and Github
      • 7.2.1 What is Github?
      • 7.2.2 Git
    • 7.3 Getting Help with R
  • 8 Notes: R basics and workflows
    • 8.1 Working with RStudio and the R Console
      • 8.1.1 Initial Setup in RStudio
      • 8.1.2 Basic Commands and Assignments
      • 8.1.3 Object names
      • 8.1.4 Functions
    • 8.2 Workspace and working directory
      • 8.2.1 Workspace, .RData
      • 8.2.2 Working directory
    • 8.3 RStudio projects
    • 8.4 Tradition
  • 9 RDD: Quick Starting with Github
    • 9.1 The Basics of GitHub and Git
      • 9.1.1 What is Git?
      • 9.1.2 What is GitHub?
    • 9.2 Getting Started with GitHub
      • 9.2.1 Create a GitHub Account
      • 9.2.2 Install Git and a Git client
    • 9.3 Half the battle
      • 9.3.1 What is a Git client? Why would you want one?
    • 9.4 📚 Resources
      • 9.4.1 Oh My Git
  • 10 Lab: Hello R!
    • About The Hello R Lab
    • Lab Goals
  • 11 Aloha R!
    • Getting started
    • Using GitHub Desktop
      • Option 2: Use RStudio
    • Introduction to R and RStudio
      • YAML
      • Committing changes
      • Pushing changes
  • 12 Zdravo R!
    • Packages
    • Data
    • Exercises
  • III Module 02
  • 13 Welcome to Data and Visualization
    • 13.1 Module Materials
      • 13.1.1 Estimated Video Length
  • 14 Exploratory Data Analysis
    • 14.1 What is in a dataset?
      • 14.1.1 Why do we visualize?
  • 15 Visualizing data with ggplot2
    • 15.1 ggplot2 and aesthetics
  • 16 Visualizing numerical data
    • 16.1 Looking at Data
    • 16.2 More on visualizing numerical data
  • 17 Visualizing categorical data
  • 18 Star Wars Activity
  • 19 Basic care and feeding of data in R
    • 19.1 Buckle your seatbelt
    • 19.2 Data frames are awesome
    • 19.3 Get the Gapminder data
    • 19.4 Meet the gapminder data frame or “tibble”
    • 19.5 Look at the variables inside a data frame
    • 19.6 Recap
  • 20 RDD: More on GITing Started with Github
    • 20.1 The Basics of GitHub and Git
      • 20.1.1 What is Git?
      • 20.1.2 What is GitHub?
    • 20.2 Understanding the GitHub flow
      • 20.2.1 Key Terms
    • 20.3 💻 GitHub terms to know
      • 20.3.1 Repositories
      • 20.3.2 Branches
      • 20.3.3 Forks
      • 20.3.4 Pull requests
      • 20.3.5 Issues
      • 20.3.6 Your user profile
      • 20.3.7 Using markdown on GitHub
      • 20.3.8 Engaging with the GitHub community
    • 20.4 Half the battle
      • 20.4.1 Free private repos
    • 20.5 Install Git
      • 20.5.1 Git already installed?
    • 20.6 Windows
      • 20.6.1 macOS
    • 20.7 Introduce yourself to Git
      • 20.7.1 More about git config
      • 20.7.2 Configure the Git editor
    • 20.8 Install a Git client
      • 20.8.1 What is a Git client? Why would you want one?
      • 20.8.2 A picture is worth a thousand words
      • 20.8.3 No one is giving out Git Nerd merit badges
      • 20.8.4 Recommended Git clients
    • 20.9 📚 Resources
    • 20.10 📝 Optional next steps
  • 21 Lab: Global plastic waste
    • Learning goals
    • Getting started
      • Packages
      • Data
    • Warm up
    • Exercises
    • Wrapping up
  • IV Module 03
  • 22 Welcome to the tidyverse!
    • 22.1 Module Materials
    • 22.2 Estimated Video Length
  • 23 Lecture: Tidy data
    • 23.1 Data structures in R
  • 24 Lecture: Grammar of data wrangling
    • 24.1 Piping
  • 25 Introduction to dplyr
    • 25.0.1 Load dplyr and gapminder
    • 25.0.2 Say hello to the gapminder tibble
    • 25.1 Think before you create excerpts of your data
    • 25.2 Use filter() to subset data row-wise
    • 25.3 Meet the new pipe operator
    • 25.4 Use select() to subset the data on variables or columns
    • 25.5 Revel in the convenience
    • 25.6 Pure, predictable, pipeable
  • 26 Hands on Data Wrangling
    • 26.1 Working with a single data frame
    • 26.2 Activity 04: Hotels
    • 26.3 ODD: Single table dplyr functions
      • 26.3.1 Load dplyr and gapminder
      • 26.3.2 Create a copy of gapminder
      • 26.3.3 Use mutate() to add new variables
      • 26.3.4 Use arrange() to row-order data in a principled way
      • 26.3.5 Use rename() to rename variables
      • 26.3.6 select() can rename and reposition variables
      • 26.3.7 group_by() is a mighty weapon
      • 26.3.8 Grouped mutate
      • 26.3.9 Grand Finale
      • 26.3.10 Resources
  • 27 Working with multiple data frames
    • 27.1 Case Studies in Joining
  • 28 ODD: Merges and Collaboration
    • 28.1 Learning goal
    • 28.2 Merges and merge conflicts
    • 28.3 Merge conflict activity
      • 28.3.1 Setup
      • 28.3.2 Let’s cause a merge conflict
    • 28.4 Tips for collaborating via GitHub
  • 29 Lab: Nobel laureates
    • Learning goals
    • Lab prep
    • Getting started
      • Packages
      • Data
    • Exercises
      • Get to know your data
      • Most living Nobel laureates were based in the US when they won their prizes
    • But of those US-based Nobel laureates, many were born in other countries
      • Here’s where those immigrant Nobelists were born
    • Interested in how Buzzfeed made their visualizations?
  • V Module 04
  • 30 Welcome to Data Diving with Types
    • 30.1 Module Materials
    • 30.2 Estimated Video Length
  • 31 Data types and recoding
    • 31.1 Why should you care about data types?
    • 31.2 Data types
      • 31.2.1 Another Hotels Activity
    • 31.3 Special Values
    • 31.4 Data classes
    • 31.5 Working with factors
      • 31.5.1 (An) Another Hotels Activity
    • 31.6 Working with Dates
    • 31.7 Working with Dates
  • 32 Importing data
    • 32.1 Importing data!
    • 32.2 Importing and Variable Types
      • 32.2.1 More Activity
    • 32.3 Vroom
  • 33 Writing and reading files
    • 33.1 File I/O overview
      • 33.1.1 Data import mindset
      • 33.1.2 Data export mindset
    • 33.2 Let’s Begin
      • 33.2.1 Load the tidyverse
      • 33.2.2 Locate the Gapminder data
      • 33.2.3 Bring rectangular data in
    • 33.3 Compute something worthy of export
      • 33.3.1 Write rectangular data out
      • 33.3.2 Invertibility
      • 33.3.3 Reordering the levels of the country factor
      • 33.3.4 saveRDS() and readRDS()
      • 33.3.5 Retaining factor levels upon re-import
      • 33.3.6 dput() and dget()
      • 33.3.7 Other types of objects to use dput() or saveRDS() on
    • 33.4 Clean up
      • 33.4.1 Pitfalls of delimited files
    • 33.5 Resources
      • 33.5.1 Data Import Activity
  • 34 ODD: Data Transformations and Tukey’s Ladder of Powers
    • 34.1 Transforming Data: Tukey’s Ladder of Powers
      • 34.1.1 Dataset Preparation and Visualization
    • 34.2 Introduction to Tukey’s Ladder of Powers
      • 34.2.1 Mathematical Formulation of Tukey’s Ladder of Powers
      • 34.2.2 Defining the Transformation Function in R
    • 34.3 Vectorizing a function
    • 34.4 Box Cox Transformation
      • 34.4.1 Additional Resources
  • 35 Lab: Visualizing spatial data
    • La Quinta is Spanish for ‘next to Denny’s’, Pt. 1
    • Getting started
      • Packages
      • Project name
      • Warm up
      • YAML
      • Commiting and pushing changes
    • The data
    • Exercises
  • VI Module 05
  • 36 Welcome to Tips for Effective Data Visualization
    • 36.1 Module Materials
    • 36.2 Estimated Video Length
  • 37 Designing effective visualizations
    • 37.1 Principles for effective visualizations
  • 38 Deeper Diving into ggplot2
    • 38.1 What are the components of a plot?
    • 38.2 Stats, Geoms, and Positions
      • 38.2.1 Jitter to the rescue
    • 38.3 Scales and Coordinates
    • 38.4 How this all works with Minard
  • 39 Plots Behaving Badly: Lessons in Data Misrepresentation
    • 39.1 General Principles
      • 39.1.1 The Problem with Pie Charts
      • 39.1.2 Barplots as data summaries
      • 39.1.3 Show the scatterplot
    • 39.2 High correlation does not imply replication
    • 39.3 Barplots for paired data
    • 39.4 Gratuitous 3D
    • 39.5 Ignoring important factors
    • 39.6 Too many significant digits
    • 39.7 Displaying data well
    • 39.8 Some further reading
  • 40 ODD: Design choices in data visualization
    • 40.1 How to spot a misleading graph
    • 40.2 Data Visualization and Misrepresentation
    • 40.3 Vox on How coronavirus charts can mislead us
    • 40.4 Vox on Shut up about the y-axis. It shouldn’t always start at zero
    • 40.5 Gloriously Terrible Plots
  • 41 ODD: Secrets of a happy graphing life
    • 41.1 The hidden data gremlins
    • 41.2 Data Frames are Your Friends
      • 41.2.1 Explicit data frame creation via tibble::tibble() and tibble::tribble()
      • 41.2.2 Sidebar: with()
    • 41.3 Worked example
      • 41.3.1 Reshape your data
      • 41.3.2 Iterate over the variables via faceting
      • 41.3.3 Recap
  • 42 Writing figures to file
    • 42.1 Step away from the mouse
    • 42.2 Good names are like breadcrumbs
    • 42.3 Graphics devices
    • 42.4 Write figures to file with ggsave()
      • 42.4.1 Passing a plot object to ggsave()
      • 42.4.2 Scaling
    • 42.5 Write non-ggplot2 figures to file
    • 42.6 Preemptive answers to some FAQs
      • 42.6.1 Despair over non-existent or empty figures
      • 42.6.2 Mysterious empty Rplots.pdf file
    • 42.7 Chunk name determines figure file name
    • 42.8 Clean up
  • 43 Lab: Wrangling spatial data
    • 43.1 La Quinta is Spanish for next to Denny’s, Pt. 2”
    • Getting started
      • Packages
      • Housekeeping
    • Warm up
      • YAML
      • Commiting and pushing changes:
    • The data
    • Exercises
  • VII Module 06
  • 44 Welcome to Confounding and Communication!
    • 44.1 Module Materials
    • 44.2 Video Length
  • 45 Scientific studies and confounding
    • 45.1 Scientific studies
    • 45.2 Climate Change: A Conditional Probability Case Study
    • 45.3 Introducing Simpson’s Paradox with a case study
    • 45.4 Revisiting Simpson’s Paradox
  • 46 Communicating data science results effectively
  • 47 Lab: Ugly charts and Simpson’s paradox
    • Getting started
      • Housekeeping
    • Packages
    • Take a sad plot and make it better
      • Instructional staff employment trends
      • Fisheries
    • Stretch Practice with Smokers in Whickham
      • Packages
      • The data
      • Exercises
    • Wrapping up
    • More ugly charts
  • VIII Module 07
  • 48 Welcome to web scraping
    • 48.1 Module Materials
    • 48.2 Estimated Video Length
  • 49 Lecture: Scraping the web
    • 49.1 Using the SelectorGadget
    • 49.2 Top 250 movies on IMDB
    • 49.3 Activity 08: IMDB
    • 49.4 Useful RegEx things
  • 50 Data usually finds me
    • 50.1 I don’t go looking for Data … Data usually finds me
    • 50.2 Two Major Approaches to Data Discovery
      • 50.2.1 The Exploratory Approach
      • 50.2.2 Confirmatory Approach to Archival Data
    • 50.3 The Data Acquisition Spectrum
      • 50.3.1 How Data Finds You
      • 50.3.2 The Adventure of Data Retrieval
      • 50.3.3 Where to Look
  • 51 Use API-wrapping packages
    • 51.1 The Data Acquisition Spectrum
    • 51.2 Direct Download
      • 51.2.1 From rOpenSci web services page
    • 51.3 Data supplied on the web
    • 51.4 Streamlined Data Retrieval with API Wrappers
      • 51.4.1 Case Study: Ornithological Data with rebird
      • 51.4.2 Searching geographic info: geonames
      • 51.4.3 Wikipedia searching
      • 51.4.4 Is it a boy or a girl? gender-associated names throughout US history
    • 51.5 Conclusion
  • 52 DIY web data
    • 52.1 Interacting with an API
      • 52.1.1 Loading Required Packages
      • 52.1.2 Understanding API Requests with the Open Movie Database
      • 52.1.3 Create an OMDb API Key
      • 52.1.4 Recreate the request URL in R
      • 52.1.5 Get data using the curl package
    • 52.2 Intro to JSON and XML
      • 52.2.1 Parsing the JSON response with jsonlite
      • 52.2.2 Parsing the XML response using xml2
    • 52.3 Introducing the easy way: httr
    • 52.4 Scraping
      • 52.4.1 Obtain a table
    • 52.5 Scraping via CSS selectors
    • 52.6 Random observations on scraping
    • 52.7 Extras
      • 52.7.1 Airports
  • 53 Lab: Better Viz
    • Conveying the right message through visualization
    • Learning Goals
    • Getting started
      • Warm up
      • Packages
      • Data
    • Exercises
  • IX Module 08
  • 54 Welcome to Functions and Automation
    • 54.1 Module Materials
  • 55 Lecture: Functions
    • 55.1 Code Along pt 1
    • 55.2 Functions for real
    • 55.3 Code Along pt 2
    • 55.4 Writing Functions
  • 56 Lecture: Automation
    • 56.1 Code Along pt 3
    • 56.2 Math to Coding
  • 57 Write your own R functions
    • 57.1 What and why?
    • 57.2 Load the nycflights13 data
    • 57.3 Example Analysis: Average Delay by Airline
    • 57.4 Get something that works
      • 57.4.1 Using dplyr for Data Filtering and Summary
      • 57.4.2 Using Base R with Subsetting
      • 57.4.3 Using with() Function
      • 57.4.4 Using aggregate() Function
      • 57.4.5 Using tapply() Function
    • 57.5 Turn the Working Interactive Code into a Function
      • 57.5.1 Initial Simple Function: The ‘Skateboard’
    • 57.6 Test the Function
      • 57.6.1 Test on new inputs
      • 57.6.2 Test on real data but different real data
  • 58 Enhancing the Function: Towards the ‘Perfectly Formed Rear-View Mirror’
  • 59 Test on Unexpected Inputs
    • 59.1 Error Handling
    • 59.2 Check the validity of arguments
      • 59.2.1 stop if not
      • 59.2.2 if then stop
      • 59.2.3 Sidebar: non-programming uses for assertions
    • 59.3 Wrap-up and what’s next?
    • 59.4 Where were we? Where are we going?
    • 59.5 Load the Gapminder data
    • 59.6 Restore our max minus min function
    • 59.7 Generalize our function to other quantiles
    • 59.8 Get something that works, again
    • 59.9 Turn the working interactive code into a function, again
    • 59.10 Argument names: freedom and conventions
    • 59.11 What a function returns
    • 59.12 Default values: freedom to NOT specify the arguments
    • 59.13 Check the validity of arguments, again
    • 59.14 Wrap-up and what’s next?
    • 59.15 Where were we? Where are we going?
    • 59.16 Load the Gapminder data
    • 59.17 Restore our max minus min function
    • 59.18 Be proactive about NAs
    • 59.19 The useful but mysterious ... argument
    • 59.20 Use testthat for formal unit tests
  • 60 Function-writing practicum
    • 60.1 Overview
    • 60.2 Load the Gapminder data
    • 60.3 Get data to practice with
    • 60.4 Get some code that works
      • 60.4.1 Sidebar: regression stuff
    • 60.5 Turn working code into a function
    • 60.6 Test on other data and in a clean workspace
    • 60.7 Are we there yet?
    • 60.8 Resources
  • 61 Lab: University of Edinburgh Art Collection
    • Learning Goals
    • Getting started
    • R scripts vs. R Markdown documents
    • SelectorGadget
      • Scraping a single page
      • Titles
      • Links
      • Artists
      • Put it altogether
      • Scrape the next page
    • Functions
    • Iteration
      • List of URLs
      • Mapping
      • Write out data
    • Analysis
      • 61.0.1 Step 1: Cleaning Up the Titles and Dates
  • X Module 09
  • 62 Welcome to Data and Ethics
    • 62.1 Module Materials
  • 63 Data Science and Ethics
    • 63.1 Module Commentary
    • 63.2 Misrepresenting Data
    • 63.3 Maps
  • 64 Bias
    • 64.1 Curated Videography
      • 64.1.1 Data Science Ethics in 6 Minutes
      • 64.1.2 AI for Good in the R and Python ecosystems
      • 64.1.3 Are We Automating Racism?
      • 64.1.4 Big Tech’s B.S. about AI ethics
      • 64.1.5 More Bias
    • 64.2 Annotated Bibliography Instructions
  • 65 Society and AI
    • 65.1 Curated Videography
      • 65.1.1 Last Week Tonight with John Oliver
  • 66 Lab: Ethics in Data Science
    • “With great power comes great responsibility”: Exploring Algorithmic Bias
    • Getting started
      • Packages
      • The data
    • Exercises
      • Part 1: Exploring the data
      • Part 2: Risk scores and recidivism
      • Part 3: Investigating disparities
      • Part 4: Understanding the sources of bias
      • Part 5: Designing fairer algorithms
    • Stretch goals
      • Investigating the sources of bias
      • Building a fairer algorithm
  • XI Module 10
  • 67 Welcome to modeling the tidy way!
    • 67.1 Module Materials
  • 68 Language of Models
    • 68.1 What is a model?
    • 68.2 Modeling the relationship between variables
  • 69 Fitting and interpreting models
    • 69.1 Models with numerical explanatory variables
    • 69.2 A More Technical Worked Example
      • 69.2.1 Does the linear model fit?
      • 69.2.2 Homoscedasticity
      • 69.2.3 Normality
      • 69.2.4 Normality of pooled residuals
      • 69.2.5 The actually correct way
      • 69.2.6 Our final model
      • 69.2.7 What have we learned?
      • 69.2.8 Commentary on Model Performance
  • 70 Models with FOO
    • 70.1 Models with categorical explanatory variables
    • 70.2 Modeling non-linear relationships
  • 71 Modeling with multiple predictors
    • 71.1 The linear model with multiple predictors
    • 71.2 Two numerical predictors
    • 71.3 My Thoughts on Tidy Modeling {tidymodelthoughts}
  • 72 Notes on Logistic Regression
    • 72.1 Predicting categorical data
    • 72.2 Sensitivity and specificity
  • 73 Lab: Modeling professor attractiveness and course evaluations
    • Why are hot professors “better” teachers?
      • Packages
    • The data
    • Exercises
      • Part 1: Exploratory Data Analysis
      • Part 2: Linear regression with a numerical predictor
      • Part 3: Linear regression with a categorical predictor
  • XII Module 11
  • 74 Welcome to Overfitting and Cross-Validation
    • 74.1 Module Materials
  • 75 Lecture: Overfitting
    • 75.1 Prediction
    • 75.2 Workflow
  • 76 Lecture: Cross-Validation
    • 76.1 V-Fold
  • 77 Notes on Feature Engineering
    • 77.1 Feature engineering
      • 77.1.1 Same training and testing sets as before
      • 77.1.2 A simple approach: mutate()
    • 77.2 Modeling workflow, revisited
    • 77.3 Building recipes
      • 77.3.1 Initiate a recipe
      • 77.3.2 Remove certain variables
      • 77.3.3 Feature engineer date
      • 77.3.4 Discretize numeric variables
      • 77.3.5 Create dummy variables
      • 77.3.6 Remove zero variance variables
      • 77.3.7 All in one place
    • 77.4 Building workflows
      • 77.4.1 Define model
      • 77.4.2 Define workflow
      • 77.4.3 Fit model to training data
      • 77.4.4 Make predictions for test data
      • 77.4.5 Evaluate the performance
    • 77.5 Making decisions
      • 77.5.1 Cutoff probability: 0.5
      • 77.5.2 Cutoff probability: 0.25
      • 77.5.3 Cutoff probability: 0.75
  • 78 ODD: Notes on Cross validation
    • 78.1 Example: Regression
    • 78.2 Example: Mixture models
    • 78.3 Better Solution: Cross validation
    • 78.4 Example
    • 78.5 Choice of \(K\)
    • 78.6 Summing up
  • 79 Lab: Modeling with multiple predictors
    • Professor attractiveness and course evaluations, Pt. 2
    • Getting started
      • Packages
    • Warm up
      • YAML
      • Commiting and pushing changes
    • The data
    • Exercises
      • Part 1: Simple linear regression
      • Part 2: Multiple linear regression
      • Part 3: The search for the best model
  • XIII Module 12
  • 80 Welcome to Quantifying Uncertainty
    • 80.1 Module Materials
  • 81 Quantifying Uncertainty
  • 82 Bootstrapping
  • 83 Notes on Hypothesis Testing
    • 83.1 Hypothesis testing for a single proportion
      • 83.1.1 Case Study: Organ Donors
      • 83.1.2 Parameter vs. statistic
      • 83.1.3 Correlation vs. causation
      • 83.1.4 Two claims
      • 83.1.5 Hypothesis testing as a court trial
      • 83.1.6 Hypothesis testing framework
      • 83.1.7 Setting the hypotheses
      • 83.1.8 Simulating the null distribution
      • 83.1.9 What do we expect?
      • 83.1.10 Simulation
      • 83.1.11 Using tidymodels to generate the null distribution
      • 83.1.12 Visualizing the null distribution
      • 83.1.13 Calculating the p-value, visually
      • 83.1.14 Calculating the p-value, directly
      • 83.1.15 Significance level
      • 83.1.16 Conclusion
      • 83.1.17 Let’s get real
      • 83.1.18 Run the test
      • 83.1.19 Visualize and calculate
    • 83.2 One vs. two sided hypothesis tests
      • 83.2.1 Types of alternative hypotheses
    • 83.3 Testing for independence
      • 83.3.1 Is yawning contagious?
      • 83.3.2 Study description
      • 83.3.3 Proportion of yawners
      • 83.3.4 Independence?
      • 83.3.5 Dependence, or another possible explanation?
      • 83.3.6 Two competing claims
      • 83.3.7 Simulation setup
      • 83.3.8 Running the simulation
      • 83.3.9 Simulation by hand
      • 83.3.10 Simulation by computation
      • 83.3.11 Recap
      • 83.3.12 Visualizing the null distribution
      • 83.3.13 Calculating the p-value, visually
      • 83.3.14 Calculating the p-value, directly
      • 83.3.15 Conclusion
  • 84 Lab: So what if you smoke when pregnant?
    • Non-parametric-based inference
    • Getting started
      • Packages
      • Housekeeping
      • Warm up
      • Set a seed!
    • 84.1 The data
    • 84.2 Exercises
      • 84.2.1 Baby weights
      • 84.2.2 Baby weight vs. smoking
      • 84.2.3 Mother’s age vs. baby weight
    • 84.3 Wrap up
  • XIV Module 13
  • 85 Welcome to Base R and Simulating Data
    • 85.1 Module Materials
    • 85.2 Estimated Video Length
  • 86 Lecture: Getting started with simulating data in R
  • 87 Getting Started with Data Simulations in R
    • 87.1 Learning Goals
    • 87.2 Generating Variables
      • 87.2.1 Generating Random Numbers
      • 87.2.2 Deep Dive into the Normal Distribution (rnorm())
      • 87.2.3 Example of using the simulated numbers from rnorm()
      • 87.2.4 runif() pulls from the uniform distribution
      • 87.2.5 Example of using the simulated numbers from runif()
      • 87.2.6 Discrete counts with rpois()
      • 87.2.7 Example of using the simulated numbers from rpois()
    • 87.3 Generate character vectors with rep()
      • 87.3.1 Using letters and LETTERS
      • 87.3.2 Repeat each element of a vector with each
      • 87.3.3 Repeat a whole vector with the times argument
      • 87.3.4 Set the output vector length with the length.out argument
      • 87.3.5 Repeat each element a different number of times
      • 87.3.6 Combining each with times
      • 87.3.7 Combining each with length.out
    • 87.4 Creating datasets with quantitative and categorical variables
      • 87.4.1 Simulate data with no differences among two groups
      • 87.4.2 Simulate data with a difference among groups
      • 87.4.3 Multiple quantitative variables with groups
    • 87.5 Repeatedly simulate data with replicate()
      • 87.5.1 Simple example of replicate()
      • 87.5.2 An equivalent for() loop example
      • 87.5.3 Using replicate() to repeatedly make a dataset
    • 87.6 What’s the next step?
  • 88 Lab: Simulating data
    • Learning goals
    • Getting started and warming up
      • Packages
    • Exercises
    • Exercise 1: Simulating Our Colonists
    • Exercise 2: Growing Our Colonists
      • Basic method
    • Exercise 3: Exploring Correlations with mvrnorm
    • Exercise 4: Preparing for the Unexpected
    • Stretch Tasks (Optional)
    • Conclusion
      • Next Steps
  • XV Module 14
  • 89 Welcome to Large Language Models
    • 89.1 Module Materials
    • 89.2 Estimated Video Length
  • 90 Lecture: What are Large Language Models?
    • 90.1 Data Science and LLMs
      • 90.1.1 What are Large Language Models?
      • 90.1.2 History of LLMs:
      • 90.1.3 How do LLMs work?
      • 90.1.4 Applications in Data Science
  • 91 Lecture: Applications of Large Language Models in Data Science
    • 91.1 Use Cases in Data Science
      • 91.1.1 R Example: Text Classification (Sentiment Analysis)
      • 91.1.2 Text Generation (Simple Markov Chain)
  • 92 Working with OpenAI’s API
    • 92.1 Getting Started
      • 92.1.1 API Authentication
      • 92.1.2 Making API Requests
    • 92.2 Example Usage and Handling the Response
      • 92.2.1 Step 1: Send a Request
      • 92.2.2 Step 2: Examine the Raw API Response
      • 92.2.3 Step 3: Extract the AI-Generated Text
      • 92.2.4 Step 4: Understanding Token Usage
    • 92.3 Error Handling
    • 92.4 Processing Multiple Requests
      • 92.4.1 Rate Limiting
    • 92.5 Conclusion
  • XVI Module 15
  • 93 Welcome to interactive web apps
    • 93.1 Module Materials
  • 94 RShiny Overview
  • 95 Practical Advice from the Data Professor
    • 95.1 Web Apps in R: Building your First Web Application in R
    • 95.2 Web Apps in R: Build Interactive Histogram Web Application in R
    • 95.3 Web Apps in R: Building Data-Driven Web Application in R
    • 95.4 Web Apps in R: Building the Machine Learning Web Application in R
    • 95.5 Web Apps in R: Build BMI Calculator web application in R for health monitoring
  • 96 All the Shiny things
    • 96.1 Building Slides
    • 96.2 Building Shiny apps
      • 96.2.1 Before we begin
      • 96.2.2 Shiny app basics
      • 96.2.3 Create an empty Shiny app
      • 96.2.4 Load the dataset
    • 96.3 Build the basic UI
      • 96.3.1 Add plain text to the UI
      • 96.3.2 Add inputs to the UI
      • 96.3.3 Add placeholders for outputs
      • 96.3.4 Output for a table summary of the results
    • 96.4 Checkpoint: what our app looks like after implementing the UI
      • 96.4.1 Implement server logic to create outputs
      • 96.4.2 Building the plot output
      • 96.4.3 Reactivity 101
      • 96.4.4 Using uiOutput() to create UI elements dynamically
      • 96.4.5 Use uiOutput() in our app to populate the countries
      • 96.4.6 Final Shiny app code
      • 96.4.7 Share your app with the world
      • 96.4.8 More Shiny features to check out
      • 96.4.9 Scoping rules in Shiny apps
      • 96.4.10 Add images
    • 96.5 Ideas to improve our app
  • 97 Shiny Resources
    • 97.1 Awesome add-on packages to Shiny
  • XVII Module 16
  • 98 Special Topics: Reproducible reports
    • 98.1 Module Materials
  • 99 Efficient Workflow with R Projects and R Markdown
    • 99.1 Overview
    • 99.2 R Projects: Your Workspace Anchor
      • 99.2.1 Create a New R Project
    • 99.3 Navigate Between Projects
    • 99.4 Recommended Workflow
    • 99.5 Your Turn
    • 99.6 Rmd Creation
      • 99.6.1 Create a New Document
    • 99.7 Compile the Document
    • 99.8 Document Types
      • 99.8.1 HTML
      • 99.8.2 PDF
      • 99.8.3 Word
      • 99.8.4 Templates
      • 99.8.5 Other Types
    • 99.9 Your Turn
  • 100 Basic Syntax
    • 100.1 Heading Text
    • 100.2 Plain Text
    • 100.3 Bold and Italicized Text
    • 100.4 Lists
      • 100.4.1 Unordered Lists
      • 100.4.2 Ordered Lists
    • 100.5 Link to a Section
    • 100.6 Hyperlink
    • 100.7 Insert Images
    • 100.8 Tabbed Sections
    • 100.9 Your Turn
    • 100.10 Lesson 4: YAML Headers
      • 100.10.1 Title
      • 100.10.2 Author(s)
      • 100.10.3 Date
    • 100.11 Table of Contents (TOC)
      • 100.11.1 Floating Table of Contents (TOC)
    • 100.12 Themes
    • 100.13 Code Folding
      • 100.13.1 Example without Code Folding
      • 100.13.2 Examples with Code Folding
    • 100.14 output
    • 100.15 Custom Template
    • 100.16 ymlthis
    • 100.17 Your Turn
    • 100.18 Lesson 5: Code Chunks and Inline Code
      • 100.18.1 Code Chunks
      • 100.18.2 Shortcuts
      • 100.18.3 Options
    • 100.19 Inline Code
    • 100.20 Your Turn (Part 1)
    • 100.21 Interactive Features
      • 100.21.1 DT
      • 100.21.2 Plotly
      • 100.21.3 Leaflet
    • 100.22 Your Turn (Part 2)
  • 101 Child Documents
    • 101.1 Extract and Run R-Code from R Markdown Files
      • 101.1.1 R Code
    • 101.2 Your Turn
  • 102 Parameterized Reports
    • 102.1 params
    • 102.2 Knitting
    • 102.3 rmarkdown::render()
    • 102.4 Your Turn
  • XVIII Module 17
  • 103 Special Topics: Machine, Learn
    • 103.1 Module Materials
  • 104 Neural Networks
    • 104.1 What is a Neural Network?
    • 104.2 How does it learn?
      • 104.2.1 Teaching A.I. to Play My Game
      • 104.2.2 Stickman A.I. Learns To Walk
  • 105 Natural Language Processing
  • XIX Module Last
  • Don’t Miss The Last Module
    • 105.1 Important Wake Forest Stuff
    • 105.2 What Next?
      • 105.2.1 Industry Transition Stories
  • 106 Optional Lab
    • Packages
    • Data collection via web scraping
    • Data cleaning
    • Data visualization and interpretation
  • 107 Lab: Academic Freedom
    • Learning goals
    • Getting started and warming up
      • Packages
      • Exercise 1: Academic Freedom in the United States
      • Exercise 2: Building the Radar Plot
      • Exercise 3: Looking Beyond the U.S.
      • Stretch Task: Where Else is Academic Freedom Declining?
  • XX Back Matter
  • 108 Good Resources
    • 108.1 Cheatsheets
  • 109 Media without a home yet
    • 109.1 SIPS Resources
    • 109.2 Visualizing Linear Models: An R Bag of Tricks
    • 109.3 For new programmers learning keyboard shortcuts
    • 109.4 Are you a student? If yes, this is the best data science project for you
    • 109.5 rstudio is magic
    • 109.6 automation quote
    • 109.7 How computer memory works
    • 109.8 Is Coding a Math Skill or a Language Skill? Neither? Both?
    • 109.9 Quantum Computers Explained
    • 109.10 The Rise of the Machines – Why Automation is Different this Time
    • 109.11 Emergence – How Stupid Things Become Smart Together
    • 109.12 How not to ask for help
    • 109.13 The Birthday Paradox
    • 109.14 Why can’t you divide by zero?
    • 109.15 Yea he’s chewing up my stats homework but that face though
    • 109.16 Coding Kitty
    • 109.17 Democratic databases: science on GitHub
    • 109.18 Ten simple rules for getting started on Twitter as a scientist
    • 109.19 NYT data ethics stuff
    • 109.20
  • 110 R Commands
  • References
  • License: CC-BY-SA

Data Science for Psychologists

108 Good Resources

  • https://psychnerdjae.github.io/into-the-tidyverse/
  • Automatic Grading with RMarkdown example
  • Git/Github for virtual learning (from this tweet)
  • Learn-Datascience-for-Free
  • https://allisonhorst.shinyapps.io/dplyr-learnr/

108.1 Cheatsheets

Rstudio has a glorious number of cheatsheets, including:

  • Data Wrangling