• Course logo for Data Science for Psychologists
  • Front Matter
  • Welcome to PSY 703
    • Mason Notes
      • How to use these notes
      • Status of course
  • Attribution
    • Major Attributions
    • Additional Attributions
  • License
  • Sitemap
  • Colophon
  • I Module 00
  • Don’t Miss Module 00
    • 0.1 Big Ideas
    • 0.2 Course Modality
      • 0.2.1 Successful Asynchronous Learning
    • 0.3 Knowledge is Power
    • 0.4 Meet Prof. Mason
    • 0.5 Website Tour
  • Guidance
    • 0.6 Materials
      • 0.6.1 Hardware
      • 0.6.2 Required Texts
      • 0.6.3 Software
    • 0.7 Portfolio Instructions
      • 0.7.1 EDA as Practice
      • 0.7.2 Additional Ground Rules
  • II Module 01
  • 1 Welcome to Data Science
    • 1.1 Module Materials
      • 1.1.1 Estimated Video Length
  • 2 What is Data Science?
    • 2.1 See for yourselves
      • 2.1.1 Shiny App
      • 2.1.2 Hans Rosling
      • 2.1.3 Social Media
      • 2.1.4 Read for yourselves
    • 2.2 Course structure and some other useful things
  • 3 Activity: UN voting
    • 3.1 UN Voting
    • 3.2 COVID Data
  • 4 Lecture: Meet our toolbox
    • 4.1 Reproducible data analysis
      • 4.1.1 Reproducibility checklist
    • 4.2 Toolkit for Reproducible Data Analysis
    • 4.3 R and RStudio
      • 4.3.1 Install R and RStudio
      • 4.3.2 Testing testing
      • 4.3.3 Add-on packages
      • 4.3.4 Further resources
  • 5 Activity: Bechdel
  • 6 Activity: Oh My Git! Version Control Challenge
  • 7 Lecture: Thoughtful Workflow
    • 7.1 R Markdown
    • 7.2 Git and Github
      • 7.2.1 What is Github?
      • 7.2.2 Git
    • 7.3 Getting Help with R
  • 8 Notes: R basics and workflows
    • 8.1 Working with RStudio and the R Console
      • 8.1.1 Initial Setup in RStudio
      • 8.1.2 Basic Commands and Assignments
      • 8.1.3 Object names
      • 8.1.4 Functions
    • 8.2 Workspace and working directory
      • 8.2.1 Workspace, .RData
      • 8.2.2 Working directory
    • 8.3 RStudio projects
    • 8.4 Tradition
  • 9 RDD: Quick Starting with Github
    • 9.1 The Basics of GitHub and Git
      • 9.1.1 What is Git?
      • 9.1.2 What is GitHub?
    • 9.2 Getting Started with GitHub
      • 9.2.1 Create a GitHub Account
      • 9.2.2 Install Git and a Git client
    • 9.3 Half the battle
      • 9.3.1 What is a Git client? Why would you want one?
    • 9.4 📚 Resources
      • 9.4.1 Oh My Git
  • 10 Lab: Hello R!
    • About The Hello R Lab
    • Lab Goals
  • 11 Aloha R!
    • Getting started
    • Using GitHub Desktop
      • Option 2: Use RStudio
    • Introduction to R and RStudio
      • YAML
      • Committing changes
      • Pushing changes
  • 12 Zdravo R!
    • Packages
    • Data
    • Exercises
  • III Module 02
  • 13 Welcome to Data and Visualization
    • 13.1 Module Materials
      • 13.1.1 Estimated Video Length
  • 14 Exploratory Data Analysis
    • 14.1 What is in a dataset?
      • 14.1.1 Why do we visualize?
  • 15 Visualizing data with ggplot2
    • 15.1 ggplot2 and aesthetics
  • 16 Visualizing numerical data
    • 16.1 Looking at Data
    • 16.2 More on visualizing numerical data
  • 17 Visualizing categorical data
  • 18 Star Wars Activity
  • 19 Basic care and feeding of data in R
    • 19.1 Buckle your seatbelt
    • 19.2 Data frames are awesome
    • 19.3 Get the Gapminder data
    • 19.4 Meet the gapminder data frame or “tibble”
    • 19.5 Look at the variables inside a data frame
    • 19.6 Recap
  • 20 RDD: More on GITing Started with Github
    • 20.1 The Basics of GitHub and Git
      • 20.1.1 What is Git?
      • 20.1.2 What is GitHub?
    • 20.2 Understanding the GitHub flow
      • 20.2.1 Key Terms
    • 20.3 💻 GitHub terms to know
      • 20.3.1 Repositories
      • 20.3.2 Branches
      • 20.3.3 Forks
      • 20.3.4 Pull requests
      • 20.3.5 Issues
      • 20.3.6 Your user profile
      • 20.3.7 Using markdown on GitHub
      • 20.3.8 Engaging with the GitHub community
    • 20.4 Half the battle
      • 20.4.1 Free private repos
    • 20.5 Install Git
      • 20.5.1 Git already installed?
    • 20.6 Windows
      • 20.6.1 macOS
    • 20.7 Introduce yourself to Git
      • 20.7.1 More about git config
      • 20.7.2 Configure the Git editor
    • 20.8 Install a Git client
      • 20.8.1 What is a Git client? Why would you want one?
      • 20.8.2 A picture is worth a thousand words
      • 20.8.3 No one is giving out Git Nerd merit badges
      • 20.8.4 Recommended Git clients
    • 20.9 📚 Resources
    • 20.10 📝 Optional next steps
  • 21 Lab: Global plastic waste
    • Learning goals
    • Getting started
      • Packages
      • Data
    • Warm up
    • Exercises
    • Wrapping up
  • IV Module 03
  • 22 Welcome to the tidyverse!
    • 22.1 Module Materials
    • 22.2 Estimated Video Length
  • 23 Lecture: Tidy data
    • 23.1 Data structures in R
  • 24 Lecture: Grammar of data wrangling
    • 24.1 Piping
  • 25 Introduction to dplyr
    • 25.0.1 Load dplyr and gapminder
    • 25.0.2 Say hello to the gapminder tibble
    • 25.1 Think before you create excerpts of your data
    • 25.2 Use filter() to subset data row-wise
    • 25.3 Meet the new pipe operator
    • 25.4 Use select() to subset the data on variables or columns
    • 25.5 Revel in the convenience
    • 25.6 Pure, predictable, pipeable
  • 26 Hands on Data Wrangling
    • 26.1 Working with a single data frame
    • 26.2 Activity 04: Hotels
    • 26.3 ODD: Single table dplyr functions
      • 26.3.1 Load dplyr and gapminder
      • 26.3.2 Create a copy of gapminder
      • 26.3.3 Use mutate() to add new variables
      • 26.3.4 Use arrange() to row-order data in a principled way
      • 26.3.5 Use rename() to rename variables
      • 26.3.6 select() can rename and reposition variables
      • 26.3.7 group_by() is a mighty weapon
      • 26.3.8 Grouped mutate
      • 26.3.9 Grand Finale
      • 26.3.10 Resources
  • 27 Working with multiple data frames
    • 27.1 Case Studies in Joining
  • 28 ODD: Merges and Collaboration
    • 28.1 Learning goal
    • 28.2 Merges and merge conflicts
    • 28.3 Merge conflict activity
      • 28.3.1 Setup
      • 28.3.2 Let’s cause a merge conflict
    • 28.4 Tips for collaborating via GitHub
  • 29 Lab: Nobel laureates
    • Learning goals
    • Lab prep
    • Getting started
      • Packages
      • Data
    • Exercises
      • Get to know your data
      • Most living Nobel laureates were based in the US when they won their prizes
    • But of those US-based Nobel laureates, many were born in other countries
      • Here’s where those immigrant Nobelists were born
    • Interested in how Buzzfeed made their visualizations?
  • V Module 04
  • 30 Welcome to Data Diving with Types
    • 30.1 Module Materials
    • 30.2 Estimated Video Length
  • 31 Data types and recoding
    • 31.1 Why should you care about data types?
    • 31.2 Data types
      • 31.2.1 Another Hotels Activity
    • 31.3 Special Values
    • 31.4 Data classes
    • 31.5 Working with factors
      • 31.5.1 (An) Another Hotels Activity
    • 31.6 Working with Dates
    • 31.7 Working with Dates
  • 32 Importing data
    • 32.1 Importing data!
    • 32.2 Importing and Variable Types
      • 32.2.1 More Activity
    • 32.3 Vroom
  • 33 Writing and reading files
    • 33.1 File I/O overview
      • 33.1.1 Data import mindset
      • 33.1.2 Data export mindset
    • 33.2 Let’s Begin
      • 33.2.1 Load the tidyverse
      • 33.2.2 Locate the Gapminder data
      • 33.2.3 Bring rectangular data in
    • 33.3 Compute something worthy of export
      • 33.3.1 Write rectangular data out
      • 33.3.2 Invertibility
      • 33.3.3 Reordering the levels of the country factor
      • 33.3.4 saveRDS() and readRDS()
      • 33.3.5 Retaining factor levels upon re-import
      • 33.3.6 dput() and dget()
      • 33.3.7 Other types of objects to use dput() or saveRDS() on
    • 33.4 Clean up
      • 33.4.1 Pitfalls of delimited files
    • 33.5 Resources
      • 33.5.1 Data Import Activity
  • 34 ODD: Data Transformations and Tukey’s Ladder of Powers
    • 34.1 Transforming Data: Tukey’s Ladder of Powers
      • 34.1.1 Dataset Preparation and Visualization
    • 34.2 Introduction to Tukey’s Ladder of Powers
      • 34.2.1 Mathematical Formulation of Tukey’s Ladder of Powers
      • 34.2.2 Defining the Transformation Function in R
    • 34.3 Vectorizing a function
    • 34.4 Box Cox Transformation
      • 34.4.1 Additional Resources
  • 35 Lab: Visualizing spatial data
    • La Quinta is Spanish for ‘next to Denny’s’, Pt. 1
    • Getting started
      • Packages
      • Project name
      • Warm up
      • YAML
      • Commiting and pushing changes
    • The data
    • Exercises
  • VI Module 05
  • 36 Welcome to Tips for Effective Data Visualization
    • 36.1 Module Materials
    • 36.2 Estimated Video Length
  • 37 Designing effective visualizations
    • 37.1 Principles for effective visualizations
  • 38 Deeper Diving into ggplot2
    • 38.1 What are the components of a plot?
    • 38.2 Stats, Geoms, and Positions
      • 38.2.1 Jitter to the rescue
    • 38.3 Scales and Coordinates
    • 38.4 How this all works with Minard
  • 39 Plots Behaving Badly: Lessons in Data Misrepresentation
    • 39.1 General Principles
      • 39.1.1 The Problem with Pie Charts
      • 39.1.2 Barplots as data summaries
      • 39.1.3 Show the scatterplot
    • 39.2 High correlation does not imply replication
    • 39.3 Barplots for paired data
    • 39.4 Gratuitous 3D
    • 39.5 Ignoring important factors
    • 39.6 Too many significant digits
    • 39.7 Displaying data well
    • 39.8 Some further reading
  • 40 ODD: Design choices in data visualization
    • 40.1 How to spot a misleading graph
    • 40.2 Data Visualization and Misrepresentation
    • 40.3 Vox on How coronavirus charts can mislead us
    • 40.4 Vox on Shut up about the y-axis. It shouldn’t always start at zero
    • 40.5 Gloriously Terrible Plots
  • 41 ODD: Secrets of a happy graphing life
    • 41.1 The hidden data gremlins
    • 41.2 Data Frames are Your Friends
      • 41.2.1 Explicit data frame creation via tibble::tibble() and tibble::tribble()
      • 41.2.2 Sidebar: with()
    • 41.3 Worked example
      • 41.3.1 Reshape your data
      • 41.3.2 Iterate over the variables via faceting
      • 41.3.3 Recap
  • 42 Writing figures to file
    • 42.1 Step away from the mouse
    • 42.2 Good names are like breadcrumbs
    • 42.3 Graphics devices
    • 42.4 Write figures to file with ggsave()
      • 42.4.1 Passing a plot object to ggsave()
      • 42.4.2 Scaling
    • 42.5 Write non-ggplot2 figures to file
    • 42.6 Preemptive answers to some FAQs
      • 42.6.1 Despair over non-existent or empty figures
      • 42.6.2 Mysterious empty Rplots.pdf file
    • 42.7 Chunk name determines figure file name
    • 42.8 Clean up
  • 43 Lab: Wrangling spatial data
    • 43.1 La Quinta is Spanish for next to Denny’s, Pt. 2”
    • Getting started
      • Packages
      • Housekeeping
    • Warm up
      • YAML
      • Commiting and pushing changes:
    • The data
    • Exercises
  • VII Module 06
  • 44 Welcome to Confounding and Communication!
    • 44.1 Module Materials
    • 44.2 Video Length
  • 45 Scientific studies and confounding
    • 45.1 Scientific studies
    • 45.2 Climate Change: A Conditional Probability Case Study
    • 45.3 Introducing Simpson’s Paradox with a case study
    • 45.4 Revisiting Simpson’s Paradox
  • 46 Communicating data science results effectively
  • 47 Lab: Ugly charts and Simpson’s paradox
    • Getting started
      • Housekeeping
    • Packages
    • Take a sad plot and make it better
      • Instructional staff employment trends
      • Fisheries
    • Stretch Practice with Smokers in Whickham
      • Packages
      • The data
      • Exercises
    • Wrapping up
    • More ugly charts
  • VIII Module 07
  • 48 Welcome to web scraping
    • 48.1 Module Materials
    • 48.2 Estimated Video Length
  • 49 Lecture: Scraping the web
    • 49.1 Using the SelectorGadget
    • 49.2 Top 250 movies on IMDB
    • 49.3 Activity 08: IMDB
    • 49.4 Useful RegEx things
  • 50 Data usually finds me
    • 50.1 I don’t go looking for Data … Data usually finds me
    • 50.2 Two Major Approaches to Data Discovery
      • 50.2.1 The Exploratory Approach
      • 50.2.2 Confirmatory Approach to Archival Data
    • 50.3 The Data Acquisition Spectrum
      • 50.3.1 How Data Finds You
      • 50.3.2 The Adventure of Data Retrieval
      • 50.3.3 Where to Look
  • 51 Use API-wrapping packages
    • 51.1 The Data Acquisition Spectrum
    • 51.2 Direct Download
      • 51.2.1 From rOpenSci web services page
    • 51.3 Data supplied on the web
    • 51.4 Streamlined Data Retrieval with API Wrappers
      • 51.4.1 Case Study: Ornithological Data with rebird
      • 51.4.2 Searching geographic info: geonames
      • 51.4.3 Wikipedia searching
      • 51.4.4 Is it a boy or a girl? gender-associated names throughout US history
    • 51.5 Conclusion
  • 52 DIY web data
    • 52.1 Interacting with an API
      • 52.1.1 Loading Required Packages
      • 52.1.2 Understanding API Requests with the Open Movie Database
      • 52.1.3 Create an OMDb API Key
      • 52.1.4 Recreate the request URL in R
      • 52.1.5 Get data using the curl package
    • 52.2 Intro to JSON and XML
      • 52.2.1 Parsing the JSON response with jsonlite
      • 52.2.2 Parsing the XML response using xml2
    • 52.3 Introducing the easy way: httr
    • 52.4 Scraping
      • 52.4.1 Obtain a table
    • 52.5 Scraping via CSS selectors
    • 52.6 Random observations on scraping
    • 52.7 Extras
      • 52.7.1 Airports
  • 53 Lab: Better Viz
    • Conveying the right message through visualization
    • Learning Goals
    • Getting started
      • Warm up
      • Packages
      • Data
    • Exercises
  • IX Module 08
  • 54 Welcome to Functions and Automation
    • 54.1 Module Materials
  • 55 Lecture: Functions
    • 55.1 Code Along pt 1
    • 55.2 Functions for real
    • 55.3 Code Along pt 2
    • 55.4 Writing Functions
  • 56 Lecture: Automation
    • 56.1 Code Along pt 3
    • 56.2 Math to Coding
  • 57 Write your own R functions
    • 57.1 What and why?
    • 57.2 Load the nycflights13 data
    • 57.3 Example Analysis: Average Delay by Airline
    • 57.4 Get something that works
      • 57.4.1 Using dplyr for Data Filtering and Summary
      • 57.4.2 Using Base R with Subsetting
      • 57.4.3 Using with() Function
      • 57.4.4 Using aggregate() Function
      • 57.4.5 Using tapply() Function
    • 57.5 Turn the Working Interactive Code into a Function
      • 57.5.1 Initial Simple Function: The ‘Skateboard’
    • 57.6 Test the Function
      • 57.6.1 Test on new inputs
      • 57.6.2 Test on real data but different real data
  • 58 Enhancing the Function: Towards the ‘Perfectly Formed Rear-View Mirror’
  • 59 Test on Unexpected Inputs
    • 59.1 Error Handling
    • 59.2 Check the validity of arguments
      • 59.2.1 stop if not
      • 59.2.2 if then stop
      • 59.2.3 Sidebar: non-programming uses for assertions
    • 59.3 Wrap-up and what’s next?
    • 59.4 Where were we? Where are we going?
    • 59.5 Load the Gapminder data
    • 59.6 Restore our max minus min function
    • 59.7 Generalize our function to other quantiles
    • 59.8 Get something that works, again
    • 59.9 Turn the working interactive code into a function, again
    • 59.10 Argument names: freedom and conventions
    • 59.11 What a function returns
    • 59.12 Default values: freedom to NOT specify the arguments
    • 59.13 Check the validity of arguments, again
    • 59.14 Wrap-up and what’s next?
    • 59.15 Where were we? Where are we going?
    • 59.16 Load the Gapminder data
    • 59.17 Restore our max minus min function
    • 59.18 Be proactive about NAs
    • 59.19 The useful but mysterious ... argument
    • 59.20 Use testthat for formal unit tests
  • 60 Function-writing practicum
    • 60.1 Overview
    • 60.2 Load the Gapminder data
    • 60.3 Get data to practice with
    • 60.4 Get some code that works
      • 60.4.1 Sidebar: regression stuff
    • 60.5 Turn working code into a function
    • 60.6 Test on other data and in a clean workspace
    • 60.7 Are we there yet?
    • 60.8 Resources
  • 61 Lab: University of Edinburgh Art Collection
    • Learning Goals
    • Getting started
    • R scripts vs. R Markdown documents
    • SelectorGadget
      • Scraping a single page
      • Titles
      • Links
      • Artists
      • Put it altogether
      • Scrape the next page
    • Functions
    • Iteration
      • List of URLs
      • Mapping
      • Write out data
    • Analysis
      • 61.0.1 Step 1: Cleaning Up the Titles and Dates
  • X Module 09
  • 62 Welcome to Data and Ethics
    • 62.1 Module Materials
  • 63 Data Science and Ethics
    • 63.1 Module Commentary
    • 63.2 Misrepresenting Data
    • 63.3 Maps
  • 64 Bias
    • 64.1 Curated Videography
      • 64.1.1 Data Science Ethics in 6 Minutes
      • 64.1.2 AI for Good in the R and Python ecosystems
      • 64.1.3 Are We Automating Racism?
      • 64.1.4 Big Tech’s B.S. about AI ethics
      • 64.1.5 More Bias
    • 64.2 Annotated Bibliography Instructions
  • 65 Society and AI
    • 65.1 Curated Videography
      • 65.1.1 Last Week Tonight with John Oliver
  • 66 Lab: Ethics in Data Science
    • “With great power comes great responsibility”: Exploring Algorithmic Bias
    • Getting started
      • Packages
      • The data
    • Exercises
      • Part 1: Exploring the data
      • Part 2: Risk scores and recidivism
      • Part 3: Investigating disparities
      • Part 4: Understanding the sources of bias
      • Part 5: Designing fairer algorithms
    • Stretch goals
      • Investigating the sources of bias
      • Building a fairer algorithm
  • XI Module 10
  • 67 Welcome to modeling the tidy way!
    • 67.1 Module Materials
  • 68 Language of Models
    • 68.1 What is a model?
    • 68.2 Modeling the relationship between variables
  • 69 Fitting and interpreting models
    • 69.1 Models with numerical explanatory variables
    • 69.2 A More Technical Worked Example
      • 69.2.1 Does the linear model fit?
      • 69.2.2 Homoscedasticity
      • 69.2.3 Normality
      • 69.2.4 Normality of pooled residuals
      • 69.2.5 The actually correct way
      • 69.2.6 Our final model
      • 69.2.7 What have we learned?
      • 69.2.8 Commentary on Model Performance
  • 70 Models with FOO
    • 70.1 Models with categorical explanatory variables
    • 70.2 Modeling non-linear relationships
  • 71 Modeling with multiple predictors
    • 71.1 The linear model with multiple predictors
    • 71.2 Two numerical predictors
    • 71.3 My Thoughts on Tidy Modeling {tidymodelthoughts}
  • 72 Notes on Logistic Regression
    • 72.1 Predicting categorical data
    • 72.2 Sensitivity and specificity
  • 73 Lab: Modeling professor attractiveness and course evaluations
    • Why are hot professors “better” teachers?
      • Packages
    • The data
    • Exercises
      • Part 1: Exploratory Data Analysis
      • Part 2: Linear regression with a numerical predictor
      • Part 3: Linear regression with a categorical predictor
  • XII Module 11
  • 74 Welcome to Overfitting and Cross-Validation
    • 74.1 Module Materials
  • 75 Lecture: Overfitting
    • 75.1 Prediction
    • 75.2 Workflow
  • 76 Lecture: Cross-Validation
    • 76.1 V-Fold
  • 77 Notes on Feature Engineering
    • 77.1 Feature engineering
      • 77.1.1 Same training and testing sets as before
      • 77.1.2 A simple approach: mutate()
    • 77.2 Modeling workflow, revisited
    • 77.3 Building recipes
      • 77.3.1 Initiate a recipe
      • 77.3.2 Remove certain variables
      • 77.3.3 Feature engineer date
      • 77.3.4 Discretize numeric variables
      • 77.3.5 Create dummy variables
      • 77.3.6 Remove zero variance variables
      • 77.3.7 All in one place
    • 77.4 Building workflows
      • 77.4.1 Define model
      • 77.4.2 Define workflow
      • 77.4.3 Fit model to training data
      • 77.4.4 Make predictions for test data
      • 77.4.5 Evaluate the performance
    • 77.5 Making decisions
      • 77.5.1 Cutoff probability: 0.5
      • 77.5.2 Cutoff probability: 0.25
      • 77.5.3 Cutoff probability: 0.75
  • 78 ODD: Notes on Cross validation
    • 78.1 Example: Regression
    • 78.2 Example: Mixture models
    • 78.3 Better Solution: Cross validation
    • 78.4 Example
    • 78.5 Choice of \(K\)
    • 78.6 Summing up
  • 79 Lab: Modeling with multiple predictors
    • Professor attractiveness and course evaluations, Pt. 2
    • Getting started
      • Packages
    • Warm up
      • YAML
      • Commiting and pushing changes
    • The data
    • Exercises
      • Part 1: Simple linear regression
      • Part 2: Multiple linear regression
      • Part 3: The search for the best model
  • XIII Module 12
  • 80 Welcome to Quantifying Uncertainty
    • 80.1 Module Materials
  • 81 Quantifying Uncertainty
  • 82 Bootstrapping
  • 83 Notes on Hypothesis Testing
    • 83.1 Hypothesis testing for a single proportion
      • 83.1.1 Case Study: Organ Donors
      • 83.1.2 Parameter vs. statistic
      • 83.1.3 Correlation vs. causation
      • 83.1.4 Two claims
      • 83.1.5 Hypothesis testing as a court trial
      • 83.1.6 Hypothesis testing framework
      • 83.1.7 Setting the hypotheses
      • 83.1.8 Simulating the null distribution
      • 83.1.9 What do we expect?
      • 83.1.10 Simulation
      • 83.1.11 Using tidymodels to generate the null distribution
      • 83.1.12 Visualizing the null distribution
      • 83.1.13 Calculating the p-value, visually
      • 83.1.14 Calculating the p-value, directly
      • 83.1.15 Significance level
      • 83.1.16 Conclusion
      • 83.1.17 Let’s get real
      • 83.1.18 Run the test
      • 83.1.19 Visualize and calculate
    • 83.2 One vs. two sided hypothesis tests
      • 83.2.1 Types of alternative hypotheses
    • 83.3 Testing for independence
      • 83.3.1 Is yawning contagious?
      • 83.3.2 Study description
      • 83.3.3 Proportion of yawners
      • 83.3.4 Independence?
      • 83.3.5 Dependence, or another possible explanation?
      • 83.3.6 Two competing claims
      • 83.3.7 Simulation setup
      • 83.3.8 Running the simulation
      • 83.3.9 Simulation by hand
      • 83.3.10 Simulation by computation
      • 83.3.11 Recap
      • 83.3.12 Visualizing the null distribution
      • 83.3.13 Calculating the p-value, visually
      • 83.3.14 Calculating the p-value, directly
      • 83.3.15 Conclusion
  • 84 Lab: So what if you smoke when pregnant?
    • Non-parametric-based inference
    • Getting started
      • Packages
      • Housekeeping
      • Warm up
      • Set a seed!
    • 84.1 The data
    • 84.2 Exercises
      • 84.2.1 Baby weights
      • 84.2.2 Baby weight vs. smoking
      • 84.2.3 Mother’s age vs. baby weight
    • 84.3 Wrap up
  • XIV Module 13
  • 85 Welcome to Base R and Simulating Data
    • 85.1 Module Materials
    • 85.2 Estimated Video Length
  • 86 Lecture: Getting started with simulating data in R
  • 87 Getting Started with Data Simulations in R
    • 87.1 Learning Goals
    • 87.2 Generating Variables
      • 87.2.1 Generating Random Numbers
      • 87.2.2 Deep Dive into the Normal Distribution (rnorm())
      • 87.2.3 Example of using the simulated numbers from rnorm()
      • 87.2.4 runif() pulls from the uniform distribution
      • 87.2.5 Example of using the simulated numbers from runif()
      • 87.2.6 Discrete counts with rpois()
      • 87.2.7 Example of using the simulated numbers from rpois()
    • 87.3 Generate character vectors with rep()
      • 87.3.1 Using letters and LETTERS
      • 87.3.2 Repeat each element of a vector with each
      • 87.3.3 Repeat a whole vector with the times argument
      • 87.3.4 Set the output vector length with the length.out argument
      • 87.3.5 Repeat each element a different number of times
      • 87.3.6 Combining each with times
      • 87.3.7 Combining each with length.out
    • 87.4 Creating datasets with quantitative and categorical variables
      • 87.4.1 Simulate data with no differences among two groups
      • 87.4.2 Simulate data with a difference among groups
      • 87.4.3 Multiple quantitative variables with groups
    • 87.5 Repeatedly simulate data with replicate()
      • 87.5.1 Simple example of replicate()
      • 87.5.2 An equivalent for() loop example
      • 87.5.3 Using replicate() to repeatedly make a dataset
    • 87.6 What’s the next step?
  • 88 Lab: Simulating data
    • Learning goals
    • Getting started and warming up
      • Packages
    • Exercises
    • Exercise 1: Simulating Our Colonists
    • Exercise 2: Growing Our Colonists
      • Basic method
    • Exercise 3: Exploring Correlations with mvrnorm
    • Exercise 4: Preparing for the Unexpected
    • Stretch Tasks (Optional)
    • Conclusion
      • Next Steps
  • XV Module 14
  • 89 Welcome to Large Language Models
    • 89.1 Module Materials
    • 89.2 Estimated Video Length
  • 90 Lecture: What are Large Language Models?
    • 90.1 Data Science and LLMs
      • 90.1.1 What are Large Language Models?
      • 90.1.2 History of LLMs:
      • 90.1.3 How do LLMs work?
      • 90.1.4 Applications in Data Science
  • 91 Lecture: Applications of Large Language Models in Data Science
    • 91.1 Use Cases in Data Science
      • 91.1.1 R Example: Text Classification (Sentiment Analysis)
      • 91.1.2 Text Generation (Simple Markov Chain)
  • 92 Working with OpenAI’s API
    • 92.1 Getting Started
      • 92.1.1 API Authentication
      • 92.1.2 Making API Requests
    • 92.2 Example Usage and Handling the Response
      • 92.2.1 Step 1: Send a Request
      • 92.2.2 Step 2: Examine the Raw API Response
      • 92.2.3 Step 3: Extract the AI-Generated Text
      • 92.2.4 Step 4: Understanding Token Usage
    • 92.3 Error Handling
    • 92.4 Processing Multiple Requests
      • 92.4.1 Rate Limiting
    • 92.5 Conclusion
  • XVI Module 15
  • 93 Welcome to interactive web apps
    • 93.1 Module Materials
  • 94 RShiny Overview
  • 95 Practical Advice from the Data Professor
    • 95.1 Web Apps in R: Building your First Web Application in R
    • 95.2 Web Apps in R: Build Interactive Histogram Web Application in R
    • 95.3 Web Apps in R: Building Data-Driven Web Application in R
    • 95.4 Web Apps in R: Building the Machine Learning Web Application in R
    • 95.5 Web Apps in R: Build BMI Calculator web application in R for health monitoring
  • 96 All the Shiny things
    • 96.1 Building Slides
    • 96.2 Building Shiny apps
      • 96.2.1 Before we begin
      • 96.2.2 Shiny app basics
      • 96.2.3 Create an empty Shiny app
      • 96.2.4 Load the dataset
    • 96.3 Build the basic UI
      • 96.3.1 Add plain text to the UI
      • 96.3.2 Add inputs to the UI
      • 96.3.3 Add placeholders for outputs
      • 96.3.4 Output for a table summary of the results
    • 96.4 Checkpoint: what our app looks like after implementing the UI
      • 96.4.1 Implement server logic to create outputs
      • 96.4.2 Building the plot output
      • 96.4.3 Reactivity 101
      • 96.4.4 Using uiOutput() to create UI elements dynamically
      • 96.4.5 Use uiOutput() in our app to populate the countries
      • 96.4.6 Final Shiny app code
      • 96.4.7 Share your app with the world
      • 96.4.8 More Shiny features to check out
      • 96.4.9 Scoping rules in Shiny apps
      • 96.4.10 Add images
    • 96.5 Ideas to improve our app
  • 97 Shiny Resources
    • 97.1 Awesome add-on packages to Shiny
  • XVII Module 16
  • 98 Special Topics: Reproducible reports
    • 98.1 Module Materials
  • 99 Efficient Workflow with R Projects and R Markdown
    • 99.1 Overview
    • 99.2 R Projects: Your Workspace Anchor
      • 99.2.1 Create a New R Project
    • 99.3 Navigate Between Projects
    • 99.4 Recommended Workflow
    • 99.5 Your Turn
    • 99.6 Rmd Creation
      • 99.6.1 Create a New Document
    • 99.7 Compile the Document
    • 99.8 Document Types
      • 99.8.1 HTML
      • 99.8.2 PDF
      • 99.8.3 Word
      • 99.8.4 Templates
      • 99.8.5 Other Types
    • 99.9 Your Turn
  • 100 Basic Syntax
    • 100.1 Heading Text
    • 100.2 Plain Text
    • 100.3 Bold and Italicized Text
    • 100.4 Lists
      • 100.4.1 Unordered Lists
      • 100.4.2 Ordered Lists
    • 100.5 Link to a Section
    • 100.6 Hyperlink
    • 100.7 Insert Images
    • 100.8 Tabbed Sections
    • 100.9 Your Turn
    • 100.10 Lesson 4: YAML Headers
      • 100.10.1 Title
      • 100.10.2 Author(s)
      • 100.10.3 Date
    • 100.11 Table of Contents (TOC)
      • 100.11.1 Floating Table of Contents (TOC)
    • 100.12 Themes
    • 100.13 Code Folding
      • 100.13.1 Example without Code Folding
      • 100.13.2 Examples with Code Folding
    • 100.14 output
    • 100.15 Custom Template
    • 100.16 ymlthis
    • 100.17 Your Turn
    • 100.18 Lesson 5: Code Chunks and Inline Code
      • 100.18.1 Code Chunks
      • 100.18.2 Shortcuts
      • 100.18.3 Options
    • 100.19 Inline Code
    • 100.20 Your Turn (Part 1)
    • 100.21 Interactive Features
      • 100.21.1 DT
      • 100.21.2 Plotly
      • 100.21.3 Leaflet
    • 100.22 Your Turn (Part 2)
  • 101 Child Documents
    • 101.1 Extract and Run R-Code from R Markdown Files
      • 101.1.1 R Code
    • 101.2 Your Turn
  • 102 Parameterized Reports
    • 102.1 params
    • 102.2 Knitting
    • 102.3 rmarkdown::render()
    • 102.4 Your Turn
  • XVIII Module 17
  • 103 Special Topics: Machine, Learn
    • 103.1 Module Materials
  • 104 Neural Networks
    • 104.1 What is a Neural Network?
    • 104.2 How does it learn?
      • 104.2.1 Teaching A.I. to Play My Game
      • 104.2.2 Stickman A.I. Learns To Walk
  • 105 Natural Language Processing
  • XIX Module Last
  • Don’t Miss The Last Module
    • 105.1 Important Wake Forest Stuff
    • 105.2 What Next?
      • 105.2.1 Industry Transition Stories
  • XX Workshop
  • Workshop Links
    • 105.3 Course Resources
      • 105.3.1 Individual Slide Decks
    • 105.4 Workshop Activities
    • 105.5 GitHub Repositories
    • 105.6 Documentation and Cheat Sheets
    • 105.7 Data Sources
    • 105.8 Further Reading and Tools
    • 105.9 Miscellaneous and Additional Resources
  • 106 Optional Lab
    • Packages
    • Data collection via web scraping
    • Data cleaning
    • Data visualization and interpretation
  • 107 Lab: Academic Freedom
    • Learning goals
    • Getting started and warming up
      • Packages
      • Exercise 1: Academic Freedom in the United States
      • Exercise 2: Building the Radar Plot
      • Exercise 3: Looking Beyond the U.S.
      • Stretch Task: Where Else is Academic Freedom Declining?
  • XXI Back Matter
  • 108 Good Resources
    • 108.1 Cheatsheets
  • 109 Media without a home yet
    • 109.1 SIPS Resources
    • 109.2 Visualizing Linear Models: An R Bag of Tricks
    • 109.3 For new programmers learning keyboard shortcuts
    • 109.4 Are you a student? If yes, this is the best data science project for you
    • 109.5 rstudio is magic
    • 109.6 automation quote
    • 109.7 How computer memory works
    • 109.8 Is Coding a Math Skill or a Language Skill? Neither? Both?
    • 109.9 Quantum Computers Explained
    • 109.10 The Rise of the Machines – Why Automation is Different this Time
    • 109.11 Emergence – How Stupid Things Become Smart Together
    • 109.12 How not to ask for help
    • 109.13 The Birthday Paradox
    • 109.14 Why can’t you divide by zero?
    • 109.15 Yea he’s chewing up my stats homework but that face though
    • 109.16 Coding Kitty
    • 109.17 Democratic databases: science on GitHub
    • 109.18 Ten simple rules for getting started on Twitter as a scientist
    • 109.19 NYT data ethics stuff
    • 109.20
  • 110 R Commands
  • References
  • License: CC-BY-SA

Data Science for Psychologists

Workshop Links

Below are all the relevant links and resources extracted from the course materials. Please use these links for accessing important resources, datasets, documentation, and further readings:

105.3 Course Resources

  • Course Website
  • Slides and Workshop Materials

105.3.1 Individual Slide Decks

  • Welcome Toolkit
  • Data Visualization
  • ggplot2 Overview
  • Visualizing Numeric and Categorical
  • Tidy Data
  • Grammar of Data Manipulation
  • Wrangling Practice
  • More ggplot Customization

105.4 Workshop Activities

  • Bechdel + R Markdown Activity: This is a hands-on activity designed to practice R Markdown and data storytelling. Please download the assignment from the GitHub repository:
    • ae-02-bechdel-rmarkdown
    • rmd file

105.5 GitHub Repositories

  • DataScience4Psych GitHub
  • Bechdel RMarkdown Activity
  • Tidy Tuesday Dataset (Feb 11, 2020)

105.6 Documentation and Cheat Sheets

  • ggplot2 Documentation
  • RMarkdown Cheat Sheet
  • Viridis Color Package
  • Tibble Documentation

105.7 Data Sources

  • US Census Data
  • Gapminder Dataset

105.8 Further Reading and Tools

  • Minard on Wikipedia
  • Science Direct Article
  • ModernDive
  • Stat545 Course Material
  • DataScienceBox
  • Support Customizing RStudio

105.9 Miscellaneous and Additional Resources

  • JFukuyama GitHub Pages
  • Hyperwar Statistical Digest
  • Blog on Grammar of Graphics

This structured compilation provides easy access to all necessary resources and external documentation required for your engagement with the workshop content.