• Course logo for Data Science for Psychologists
  • Front Matter
  • Welcome to Data Science for Psychologists
    • Mason Notes
      • How to use these notes
      • Status of course
  • Attribution
    • Major Attributions
    • Additional Attributions
  • License
  • Sitemap
  • Colophon
  • I Module 00
  • Don’t Miss Module 00
    • 0.1 Big Ideas
    • 0.2 Course Modality
      • 0.2.1 Successful Asynchronous Learning
    • 0.3 Knowledge is Power
    • 0.4 Meet Prof. Mason
    • 0.5 Website Tour
  • Guidance
    • 0.6 Materials
      • 0.6.1 Hardware
      • 0.6.2 Required Texts
      • 0.6.3 Software
    • 0.7 Portfolio Instructions
      • 0.7.1 EDA as Practice
      • 0.7.2 Additional Ground Rules
  • II Module 01
  • 1 Welcome to Data Science
    • 1.1 Module Materials
      • 1.1.1 Estimated Video Length
  • 2 What is Data Science?
    • 2.1 See for yourselves
      • 2.1.1 Shiny App
      • 2.1.2 Hans Rosling
      • 2.1.3 Social Media
      • 2.1.4 Read for yourselves
    • 2.2 Course structure and some other useful things
  • 3 Activity: UN voting
    • 3.1 UN Voting
    • 3.2 COVID Data
  • 4 Lecture: Meet our toolbox
    • 4.1 Reproducible data analysis
      • 4.1.1 Reproducibility checklist
    • 4.2 Toolkit for Reproducible Data Analysis
    • 4.3 R and RStudio
      • 4.3.1 Install R and RStudio
      • 4.3.2 Testing testing
      • 4.3.3 Add-on packages
      • 4.3.4 Further resources
  • 5 Activity: Bechdel
  • 6 Activity: Oh My Git! Version Control Challenge
  • 7 Lecture: Thoughtful Workflow
    • 7.1 R Markdown
    • 7.2 Git and Github
      • 7.2.1 What is Github?
      • 7.2.2 Git
    • 7.3 Getting Help with R
  • 8 Notes: R basics and workflows
    • 8.1 Working with RStudio and the R Console
      • 8.1.1 Initial Setup in RStudio
      • 8.1.2 Basic Commands and Assignments
      • 8.1.3 Object names
      • 8.1.4 Functions
    • 8.2 Workspace and working directory
      • 8.2.1 Workspace, .RData
      • 8.2.2 Working directory
    • 8.3 RStudio projects
    • 8.4 Tradition
  • 9 RDD: Quick Starting with Github
    • 9.1 The Basics of GitHub and Git
      • 9.1.1 What is Git?
      • 9.1.2 What is GitHub?
    • 9.2 Getting Started with GitHub
      • 9.2.1 Create a GitHub Account
      • 9.2.2 Install Git and a Git client
    • 9.3 Half the battle
      • 9.3.1 What is a Git client? Why would you want one?
    • 9.4 📚 Resources
      • 9.4.1 Oh My Git
  • 10 Lab: Hello R!
    • About The Hello R Lab
    • Lab Goals
  • 11 Aloha R!
    • Getting started
    • Using GitHub Desktop
      • Option 2: Use RStudio
    • Introduction to R and RStudio
      • YAML
      • Committing changes
      • Pushing changes
    • Packages
    • Data
    • Exercises
  • III Module 02
  • 12 Welcome to Data and Visualization
    • 12.1 Module Materials
      • 12.1.1 Estimated Video Length
  • 13 Exploratory Data Analysis
    • 13.1 What is in a dataset?
      • 13.1.1 Why do we visualize?
  • 14 Visualizing data with ggplot2
    • 14.1 ggplot2 and aesthetics
  • 15 Visualizing numerical data
    • 15.1 Looking at Data
    • 15.2 More on visualizing numerical data
  • 16 Visualizing categorical data
  • 17 Star Wars Activity
  • 18 Basic care and feeding of data in R
    • 18.1 Buckle your seatbelt
    • 18.2 Data frames are awesome
    • 18.3 Get the Gapminder data
    • 18.4 Meet the gapminder data frame or “tibble”
    • 18.5 Look at the variables inside a data frame
    • 18.6 Recap
  • 19 RDD: More on GITing Started with Github
    • 19.1 The Basics of GitHub and Git
      • 19.1.1 What is Git?
      • 19.1.2 What is GitHub?
    • 19.2 Understanding the GitHub flow
      • 19.2.1 Key Terms
    • 19.3 💻 GitHub terms to know
      • 19.3.1 Repositories
      • 19.3.2 Branches
      • 19.3.3 Forks
      • 19.3.4 Pull requests
      • 19.3.5 Issues
      • 19.3.6 Your user profile
      • 19.3.7 Using markdown on GitHub
      • 19.3.8 Engaging with the GitHub community
    • 19.4 Half the battle
      • 19.4.1 Free private repos
    • 19.5 Install Git
      • 19.5.1 Git already installed?
    • 19.6 Windows
      • 19.6.1 macOS
    • 19.7 Introduce yourself to Git
      • 19.7.1 More about git config
      • 19.7.2 Configure the Git editor
    • 19.8 Install a Git client
      • 19.8.1 What is a Git client? Why would you want one?
      • 19.8.2 A picture is worth a thousand words
      • 19.8.3 No one is giving out Git Nerd merit badges
      • 19.8.4 Recommended Git clients
    • 19.9 📚 Resources
    • 19.10 📝 Optional next steps
  • 20 Lab: Global plastic waste
    • Learning goals
    • Getting started
      • Packages
      • Data
    • Warm up
    • Exercises
    • Wrapping up
  • IV Module 03
  • 21 Welcome to the tidyverse!
    • 21.1 Module Materials
    • 21.2 Estimated Video Length
  • 22 Lecture: Tidy data
    • 22.1 Data structures in R
  • 23 Lecture: Grammar of data wrangling
    • 23.1 Piping
  • 24 Introduction to dplyr
    • 24.0.1 Load dplyr and gapminder
    • 24.0.2 Say hello to the gapminder tibble
    • 24.1 Think before you create excerpts of your data
    • 24.2 Use filter() to subset data row-wise
    • 24.3 Meet the new pipe operator
    • 24.4 Use select() to subset the data on variables or columns
    • 24.5 Revel in the convenience
    • 24.6 Pure, predictable, pipeable
  • 25 Hands on Data Wrangling
    • 25.1 Working with a single data frame
    • 25.2 Activity 04: Hotels
    • 25.3 ODD: Single table dplyr functions
      • 25.3.1 Load dplyr and gapminder
      • 25.3.2 Create a copy of gapminder
      • 25.3.3 Use mutate() to add new variables
      • 25.3.4 Use arrange() to row-order data in a principled way
      • 25.3.5 Use rename() to rename variables
      • 25.3.6 select() can rename and reposition variables
      • 25.3.7 group_by() is a mighty weapon
      • 25.3.8 Grouped mutate
      • 25.3.9 Grand Finale
      • 25.3.10 Resources
  • 26 Working with multiple data frames
    • 26.1 Case Studies in Joining
  • 27 ODD: Merges and Collaboration
    • 27.1 Learning goal
    • 27.2 Merges and merge conflicts
    • 27.3 Merge conflict activity
      • 27.3.1 Setup
      • 27.3.2 Let’s cause a merge conflict
    • 27.4 Tips for collaborating via GitHub
  • 28 Lab: Nobel laureates
    • Learning goals
    • Lab prep
    • Getting started
      • Packages
      • Data
    • Exercises
      • Get to know your data
      • Most living Nobel laureates were based in the US when they won their prizes
    • But of those US-based Nobel laureates, many were born in other countries
      • Here’s where those immigrant Nobelists were born
    • Interested in how Buzzfeed made their visualizations?
  • V Module 04
  • 29 Welcome to Data Diving with Types
    • 29.1 Module Materials
    • 29.2 Estimated Video Length
  • 30 Data types and recoding
    • 30.1 Why should you care about data types?
    • 30.2 Data types
      • 30.2.1 Another Hotels Activity
    • 30.3 Special Values
    • 30.4 Data classes
    • 30.5 Working with factors
      • 30.5.1 (An) Another Hotels Activity
    • 30.6 Working with Dates
    • 30.7 Working with Dates
  • 31 Importing data
    • 31.1 Importing data!
    • 31.2 Importing and Variable Types
      • 31.2.1 More Activity
    • 31.3 Vroom
  • 32 Writing and reading files
    • 32.1 File I/O overview
      • 32.1.1 Data import mindset
      • 32.1.2 Data export mindset
    • 32.2 Let’s Begin
      • 32.2.1 Load the tidyverse
      • 32.2.2 Locate the Gapminder data
      • 32.2.3 Bring rectangular data in
    • 32.3 Compute something worthy of export
      • 32.3.1 Write rectangular data out
      • 32.3.2 Invertibility
      • 32.3.3 Reordering the levels of the country factor
      • 32.3.4 saveRDS() and readRDS()
      • 32.3.5 Retaining factor levels upon re-import
      • 32.3.6 dput() and dget()
      • 32.3.7 Other types of objects to use dput() or saveRDS() on
    • 32.4 Clean up
      • 32.4.1 Pitfalls of delimited files
    • 32.5 Resources
      • 32.5.1 Data Import Activity
  • 33 ODD: Data Transformations and Tukey’s Ladder of Powers
    • 33.1 Transforming Data: Tukey’s Ladder of Powers
      • 33.1.1 Dataset Preparation and Visualization
    • 33.2 Introduction to Tukey’s Ladder of Powers
      • 33.2.1 Mathematical Formulation of Tukey’s Ladder of Powers
      • 33.2.2 Defining the Transformation Function in R
    • 33.3 Vectorizing a function
    • 33.4 Box Cox Transformation
      • 33.4.1 Additional Resources
  • 34 Lab: Visualizing spatial data
    • La Quinta is Spanish for ‘next to Denny’s’, Pt. 1
    • Getting started
      • Packages
      • Project name
      • Warm up
      • YAML
      • Commiting and pushing changes
    • The data
    • Exercises
  • VI Module 05
  • 35 Welcome to Tips for Effective Data Visualization
    • 35.1 Module Materials
    • 35.2 Estimated Video Length
  • 36 Designing effective visualizations
    • 36.1 Principles for effective visualizations
  • 37 Deeper Diving into ggplot2
    • 37.1 What are the components of a plot?
    • 37.2 Stats, Geoms, and Positions
      • 37.2.1 Jitter to the rescue
    • 37.3 Scales and Coordinates
    • 37.4 How this all works with Minard
  • 38 Plots Behaving Badly: Lessons in Data Misrepresentation
    • 38.1 General Principles
      • 38.1.1 The Problem with Pie Charts
      • 38.1.2 Barplots as data summaries
      • 38.1.3 Show the scatterplot
    • 38.2 High correlation does not imply replication
    • 38.3 Barplots for paired data
    • 38.4 Gratuitous 3D
    • 38.5 Ignoring important factors
    • 38.6 Too many significant digits
    • 38.7 Displaying data well
    • 38.8 Some further reading
  • 39 ODD: Design choices in data visualization
    • 39.1 How to spot a misleading graph
    • 39.2 Data Visualization and Misrepresentation
    • 39.3 Vox on How coronavirus charts can mislead us
    • 39.4 Vox on Shut up about the y-axis. It shouldn’t always start at zero
    • 39.5 Gloriously Terrible Plots
  • 40 ODD: Secrets of a happy graphing life
    • 40.1 The hidden data gremlins
    • 40.2 Data Frames are Your Friends
      • 40.2.1 Explicit data frame creation via tibble::tibble() and tibble::tribble()
      • 40.2.2 Sidebar: with()
    • 40.3 Worked example
      • 40.3.1 Reshape your data
      • 40.3.2 Iterate over the variables via faceting
      • 40.3.3 Recap
  • 41 Writing figures to file
    • 41.1 Step away from the mouse
    • 41.2 Good names are like breadcrumbs
    • 41.3 Graphics devices
    • 41.4 Write figures to file with ggsave()
      • 41.4.1 Passing a plot object to ggsave()
      • 41.4.2 Scaling
    • 41.5 Write non-ggplot2 figures to file
    • 41.6 Preemptive answers to some FAQs
      • 41.6.1 Despair over non-existent or empty figures
      • 41.6.2 Mysterious empty Rplots.pdf file
    • 41.7 Chunk name determines figure file name
    • 41.8 Clean up
  • 42 Lab: Wrangling spatial data
    • 42.1 La Quinta is Spanish for next to Denny’s, Pt. 2”
    • Getting started
      • Packages
      • Housekeeping
    • Warm up
      • YAML
      • Commiting and pushing changes:
    • The data
    • Exercises
  • VII Module 06
  • 43 Welcome to Confounding and Communication!
    • 43.1 Module Materials
    • 43.2 Video Length
  • 44 Scientific studies and confounding
    • 44.1 Scientific studies
    • 44.2 Climate Change: A Conditional Probability Case Study
    • 44.3 Introducing Simpson’s Paradox with a case study
    • 44.4 Revisiting Simpson’s Paradox
  • 45 Communicating data science results effectively
  • 46 Lab: Ugly charts and Simpson’s paradox
    • Getting started
      • Housekeeping
    • Packages
    • Take a sad plot and make it better
      • Instructional staff employment trends
      • Fisheries
    • Stretch Practice with Smokers in Whickham
      • Packages
      • The data
      • Exercises
    • Wrapping up
    • More ugly charts
  • VIII Module 07
  • 47 Welcome to web scraping
    • 47.1 Module Materials
    • 47.2 Estimated Video Length
  • 48 Lecture: Scraping the web
    • 48.1 Using the SelectorGadget
    • 48.2 Top 250 movies on IMDB
    • 48.3 Activity 08: IMDB
    • 48.4 Useful RegEx things
  • 49 Data usually finds me
    • 49.1 I don’t go looking for Data … Data usually finds me
    • 49.2 Two Major Approaches to Data Discovery
      • 49.2.1 The Exploratory Approach
      • 49.2.2 Confirmatory Approach to Archival Data
    • 49.3 The Data Acquisition Spectrum
      • 49.3.1 How Data Finds You
      • 49.3.2 The Adventure of Data Retrieval
      • 49.3.3 Where to Look
  • 50 Use API-wrapping packages
    • 50.1 The Data Acquisition Spectrum
    • 50.2 Direct Download
      • 50.2.1 From rOpenSci web services page
    • 50.3 Data supplied on the web
    • 50.4 Streamlined Data Retrieval with API Wrappers
      • 50.4.1 Case Study: Ornithological Data with rebird
      • 50.4.2 Searching geographic info: geonames
      • 50.4.3 Wikipedia searching
      • 50.4.4 Is it a boy or a girl? gender-associated names throughout US history
    • 50.5 Conclusion
  • 51 DIY web data
    • 51.1 Interacting with an API
      • 51.1.1 Loading Required Packages
      • 51.1.2 Understanding API Requests with the Open Movie Database
      • 51.1.3 Create an OMDb API Key
      • 51.1.4 Recreate the request URL in R
      • 51.1.5 Get data using the curl package
    • 51.2 Intro to JSON and XML
      • 51.2.1 Parsing the JSON response with jsonlite
      • 51.2.2 Parsing the XML response using xml2
    • 51.3 Introducing the easy way: httr
    • 51.4 Scraping
      • 51.4.1 Obtain a table
    • 51.5 Scraping via CSS selectors
    • 51.6 Random observations on scraping
    • 51.7 Extras
      • 51.7.1 Airports
  • 52 Lab: Better Viz
    • Conveying the right message through visualization
    • Learning Goals
    • Getting started
      • Warm up
      • Packages
      • Data
    • Exercises
  • IX Module 08
  • 53 Welcome to Functions and Automation
    • 53.1 Module Materials
  • 54 Lecture: Functions
    • 54.1 Code Along pt 1
    • 54.2 Functions for real
    • 54.3 Code Along pt 2
    • 54.4 Writing Functions
  • 55 Lecture: Automation
    • 55.1 Code Along pt 3
    • 55.2 Math to Coding
  • 56 Write your own R functions
    • 56.1 What and why?
    • 56.2 Load the nycflights13 data
    • 56.3 Example Analysis: Average Delay by Airline
    • 56.4 Get something that works
      • 56.4.1 Using dplyr for Data Filtering and Summary
      • 56.4.2 Using Base R with Subsetting
      • 56.4.3 Using with() Function
      • 56.4.4 Using aggregate() Function
      • 56.4.5 Using tapply() Function
    • 56.5 Turn the Working Interactive Code into a Function
      • 56.5.1 Initial Simple Function: The ‘Skateboard’
    • 56.6 Test the Function
      • 56.6.1 Test on new inputs
      • 56.6.2 Test on real data but different real data
  • 57 Enhancing the Function: Towards the ‘Perfectly Formed Rear-View Mirror’
  • 58 Test on Unexpected Inputs
    • 58.1 Error Handling
    • 58.2 Check the validity of arguments
      • 58.2.1 stop if not
      • 58.2.2 if then stop
      • 58.2.3 Sidebar: non-programming uses for assertions
    • 58.3 Wrap-up and what’s next?
    • 58.4 Where were we? Where are we going?
    • 58.5 Load the Gapminder data
    • 58.6 Restore our max minus min function
    • 58.7 Generalize our function to other quantiles
    • 58.8 Get something that works, again
    • 58.9 Turn the working interactive code into a function, again
    • 58.10 Argument names: freedom and conventions
    • 58.11 What a function returns
    • 58.12 Default values: freedom to NOT specify the arguments
    • 58.13 Check the validity of arguments, again
    • 58.14 Wrap-up and what’s next?
    • 58.15 Where were we? Where are we going?
    • 58.16 Load the Gapminder data
    • 58.17 Restore our max minus min function
    • 58.18 Be proactive about NAs
    • 58.19 The useful but mysterious ... argument
    • 58.20 Use testthat for formal unit tests
  • 59 Function-writing practicum
    • 59.1 Overview
    • 59.2 Load the Gapminder data
    • 59.3 Get data to practice with
    • 59.4 Get some code that works
      • 59.4.1 Sidebar: regression stuff
    • 59.5 Turn working code into a function
    • 59.6 Test on other data and in a clean workspace
    • 59.7 Are we there yet?
    • 59.8 Resources
  • 60 Lab: University of Edinburgh Art Collection
    • Learning Goals
    • Getting started
    • R scripts vs. R Markdown documents
    • SelectorGadget
      • Scraping a single page
      • Titles
      • Links
      • Artists
      • Put it altogether
      • Scrape the next page
    • Functions
    • Iteration
      • List of URLs
      • Mapping
      • Write out data
    • Analysis
      • 60.0.1 Step 1: Cleaning Up the Titles and Dates
  • X Module 09
  • 61 Welcome to Data and Ethics
    • 61.1 Module Materials
  • 62 Data Science and Ethics
    • 62.1 Module Commentary
    • 62.2 Misrepresenting Data
    • 62.3 Maps
  • 63 Bias
    • 63.1 Curated Videography
      • 63.1.1 Data Science Ethics in 6 Minutes
      • 63.1.2 AI for Good in the R and Python ecosystems
      • 63.1.3 Are We Automating Racism?
      • 63.1.4 Big Tech’s B.S. about AI ethics
      • 63.1.5 More Bias
    • 63.2 Annotated Bibliography Instructions
  • 64 Society and AI
    • 64.1 Curated Videography
      • 64.1.1 Last Week Tonight with John Oliver
  • 65 Lab: Ethics in Data Science
    • “With great power comes great responsibility”: Exploring Algorithmic Bias
    • Getting started
      • Packages
      • The data
    • Exercises
      • Part 1: Exploring the data
      • Part 2: Risk scores and recidivism
      • Part 3: Investigating disparities
      • Part 4: Understanding the sources of bias
      • Part 5: Designing fairer algorithms
    • Stretch goals
      • Investigating the sources of bias
      • Building a fairer algorithm
  • XI Module 10
  • 66 Welcome to modeling the tidy way!
    • 66.1 Module Materials
  • 67 Language of Models
    • 67.1 What is a model?
    • 67.2 Modeling the relationship between variables
  • 68 Fitting and interpreting models
    • 68.1 Models with numerical explanatory variables
    • 68.2 A More Technical Worked Example
      • 68.2.1 Does the linear model fit?
      • 68.2.2 Homoscedasticity
      • 68.2.3 Normality
      • 68.2.4 Normality of pooled residuals
      • 68.2.5 The actually correct way
      • 68.2.6 Our final model
      • 68.2.7 What have we learned?
      • 68.2.8 Commentary on Model Performance
  • 69 Models with FOO
    • 69.1 Models with categorical explanatory variables
    • 69.2 Modeling non-linear relationships
  • 70 Modeling with multiple predictors
    • 70.1 The linear model with multiple predictors
    • 70.2 Two numerical predictors
    • 70.3 My Thoughts on Tidy Modeling {tidymodelthoughts}
  • 71 Notes on Logistic Regression
    • 71.1 Predicting categorical data
    • 71.2 Sensitivity and specificity
  • 72 Lab: Modeling professor attractiveness and course evaluations
    • Why are hot professors “better” teachers?
      • Packages
    • The data
    • Exercises
      • Part 1: Exploratory Data Analysis
      • Part 2: Linear regression with a numerical predictor
      • Part 3: Linear regression with a categorical predictor
  • XII Module 11
  • 73 Welcome to Overfitting and Cross-Validation
    • 73.1 Module Materials
  • 74 Lecture: Overfitting
    • 74.1 Prediction
    • 74.2 Workflow
  • 75 Lecture: Cross-Validation
    • 75.1 V-Fold
  • 76 Notes on Feature Engineering
    • 76.1 Feature engineering
      • 76.1.1 Same training and testing sets as before
      • 76.1.2 A simple approach: mutate()
    • 76.2 Modeling workflow, revisited
    • 76.3 Building recipes
      • 76.3.1 Initiate a recipe
      • 76.3.2 Remove certain variables
      • 76.3.3 Feature engineer date
      • 76.3.4 Discretize numeric variables
      • 76.3.5 Create dummy variables
      • 76.3.6 Remove zero variance variables
      • 76.3.7 All in one place
    • 76.4 Building workflows
      • 76.4.1 Define model
      • 76.4.2 Define workflow
      • 76.4.3 Fit model to training data
      • 76.4.4 Make predictions for test data
      • 76.4.5 Evaluate the performance
    • 76.5 Making decisions
      • 76.5.1 Cutoff probability: 0.5
      • 76.5.2 Cutoff probability: 0.25
      • 76.5.3 Cutoff probability: 0.75
  • 77 ODD: Notes on Cross validation
    • 77.1 Example: Regression
    • 77.2 Example: Mixture models
    • 77.3 Better Solution: Cross validation
    • 77.4 Example
    • 77.5 Choice of \(K\)
    • 77.6 Summing up
  • 78 Lab: Modeling with multiple predictors
    • Professor attractiveness and course evaluations, Pt. 2
    • Getting started
      • Packages
    • Warm up
      • YAML
      • Commiting and pushing changes
    • The data
    • Exercises
      • Part 1: Simple linear regression
      • Part 2: Multiple linear regression
      • Part 3: The search for the best model
  • XIII Module 12
  • 79 Welcome to Quantifying Uncertainty
    • 79.1 Module Materials
  • 80 Quantifying Uncertainty
  • 81 Bootstrapping
  • 82 Notes on Hypothesis Testing
    • 82.1 Hypothesis testing for a single proportion
      • 82.1.1 Case Study: Organ Donors
      • 82.1.2 Parameter vs. statistic
      • 82.1.3 Correlation vs. causation
      • 82.1.4 Two claims
      • 82.1.5 Hypothesis testing as a court trial
      • 82.1.6 Hypothesis testing framework
      • 82.1.7 Setting the hypotheses
      • 82.1.8 Simulating the null distribution
      • 82.1.9 What do we expect?
      • 82.1.10 Simulation
      • 82.1.11 Using tidymodels to generate the null distribution
      • 82.1.12 Visualizing the null distribution
      • 82.1.13 Calculating the p-value, visually
      • 82.1.14 Calculating the p-value, directly
      • 82.1.15 Significance level
      • 82.1.16 Conclusion
      • 82.1.17 Let’s get real
      • 82.1.18 Run the test
      • 82.1.19 Visualize and calculate
    • 82.2 One vs. two sided hypothesis tests
      • 82.2.1 Types of alternative hypotheses
    • 82.3 Testing for independence
      • 82.3.1 Is yawning contagious?
      • 82.3.2 Study description
      • 82.3.3 Proportion of yawners
      • 82.3.4 Independence?
      • 82.3.5 Dependence, or another possible explanation?
      • 82.3.6 Two competing claims
      • 82.3.7 Simulation setup
      • 82.3.8 Running the simulation
      • 82.3.9 Simulation by hand
      • 82.3.10 Simulation by computation
      • 82.3.11 Recap
      • 82.3.12 Visualizing the null distribution
      • 82.3.13 Calculating the p-value, visually
      • 82.3.14 Calculating the p-value, directly
      • 82.3.15 Conclusion
  • 83 Lab: So what if you smoke when pregnant?
    • Non-parametric-based inference
    • Getting started
      • Packages
      • Housekeeping
      • Warm up
      • Set a seed!
    • 83.1 The data
    • 83.2 Exercises
      • 83.2.1 Baby weights
      • 83.2.2 Baby weight vs. smoking
      • 83.2.3 Mother’s age vs. baby weight
    • 83.3 Wrap up
  • XIV Module 13
  • 84 Welcome to Base R and Simulating Data
    • 84.1 Module Materials
    • 84.2 Estimated Video Length
  • 85 Lecture: Getting started with simulating data in R
  • 86 Getting Started with Data Simulations in R
    • 86.1 Learning Goals
    • 86.2 Generating Variables
      • 86.2.1 Generating Random Numbers
      • 86.2.2 Deep Dive into the Normal Distribution (rnorm())
      • 86.2.3 Example of using the simulated numbers from rnorm()
      • 86.2.4 runif() pulls from the uniform distribution
      • 86.2.5 Example of using the simulated numbers from runif()
      • 86.2.6 Discrete counts with rpois()
      • 86.2.7 Example of using the simulated numbers from rpois()
    • 86.3 Generate character vectors with rep()
      • 86.3.1 Using letters and LETTERS
      • 86.3.2 Repeat each element of a vector with each
      • 86.3.3 Repeat a whole vector with the times argument
      • 86.3.4 Set the output vector length with the length.out argument
      • 86.3.5 Repeat each element a different number of times
      • 86.3.6 Combining each with times
      • 86.3.7 Combining each with length.out
    • 86.4 Creating datasets with quantitative and categorical variables
      • 86.4.1 Simulate data with no differences among two groups
      • 86.4.2 Simulate data with a difference among groups
      • 86.4.3 Multiple quantitative variables with groups
    • 86.5 Repeatedly simulate data with replicate()
      • 86.5.1 Simple example of replicate()
      • 86.5.2 An equivalent for() loop example
      • 86.5.3 Using replicate() to repeatedly make a dataset
    • 86.6 What’s the next step?
  • 87 Lab: Simulating data
    • Learning goals
    • Getting started and warming up
      • Packages
    • Exercises
    • Exercise 1: Simulating Our Colonists
    • Exercise 2: Growing Our Colonists
      • Basic method
    • Exercise 3: Exploring Correlations with mvrnorm
    • Exercise 4: Preparing for the Unexpected
    • Stretch Tasks (Optional)
    • Conclusion
      • Next Steps
  • XV Module 14
  • 88 Welcome to Large Language Models
    • 88.1 Module Materials
    • 88.2 Estimated Video Length
  • 89 Lecture: What are Large Language Models?
    • 89.1 Data Science and LLMs
      • 89.1.1 What are Large Language Models?
      • 89.1.2 History of LLMs:
      • 89.1.3 How do LLMs work?
      • 89.1.4 Applications in Data Science
  • 90 Lecture: Applications of Large Language Models in Data Science
    • 90.1 Use Cases in Data Science
      • 90.1.1 R Example: Text Classification (Sentiment Analysis)
      • 90.1.2 Text Generation (Simple Markov Chain)
  • 91 Working with OpenAI’s API
    • 91.1 Getting Started
      • 91.1.1 API Authentication
      • 91.1.2 Making API Requests
    • 91.2 Example Usage and Handling the Response
      • 91.2.1 Step 1: Send a Request
      • 91.2.2 Step 2: Examine the Raw API Response
      • 91.2.3 Step 3: Extract the AI-Generated Text
      • 91.2.4 Step 4: Understanding Token Usage
    • 91.3 Error Handling
    • 91.4 Processing Multiple Requests
      • 91.4.1 Rate Limiting
    • 91.5 Conclusion
  • XVI Module 15
  • 92 Welcome to interactive web apps
    • 92.1 Module Materials
  • 93 RShiny Overview
  • 94 Practical Advice from the Data Professor
    • 94.1 Web Apps in R: Building your First Web Application in R
    • 94.2 Web Apps in R: Build Interactive Histogram Web Application in R
    • 94.3 Web Apps in R: Building Data-Driven Web Application in R
    • 94.4 Web Apps in R: Building the Machine Learning Web Application in R
    • 94.5 Web Apps in R: Build BMI Calculator web application in R for health monitoring
  • 95 All the Shiny things
    • 95.1 Building Slides
    • 95.2 Building Shiny apps
      • 95.2.1 Before we begin
      • 95.2.2 Shiny app basics
      • 95.2.3 Create an empty Shiny app
      • 95.2.4 Load the dataset
    • 95.3 Build the basic UI
      • 95.3.1 Add plain text to the UI
      • 95.3.2 Add inputs to the UI
      • 95.3.3 Add placeholders for outputs
      • 95.3.4 Output for a table summary of the results
    • 95.4 Checkpoint: what our app looks like after implementing the UI
      • 95.4.1 Implement server logic to create outputs
      • 95.4.2 Building the plot output
      • 95.4.3 Reactivity 101
      • 95.4.4 Using uiOutput() to create UI elements dynamically
      • 95.4.5 Use uiOutput() in our app to populate the countries
      • 95.4.6 Final Shiny app code
      • 95.4.7 Share your app with the world
      • 95.4.8 More Shiny features to check out
      • 95.4.9 Scoping rules in Shiny apps
      • 95.4.10 Add images
    • 95.5 Ideas to improve our app
  • 96 Shiny Resources
    • 96.1 Awesome add-on packages to Shiny
  • XVII Module 16
  • 97 Special Topics: Reproducible reports
    • 97.1 Module Materials
  • 98 Efficient Workflow with R Projects and R Markdown
    • 98.1 Overview
    • 98.2 R Projects: Your Workspace Anchor
      • 98.2.1 Create a New R Project
    • 98.3 Navigate Between Projects
    • 98.4 Recommended Workflow
    • 98.5 Your Turn
    • 98.6 Rmd Creation
      • 98.6.1 Create a New Document
    • 98.7 Compile the Document
    • 98.8 Document Types
      • 98.8.1 HTML
      • 98.8.2 PDF
      • 98.8.3 Word
      • 98.8.4 Templates
      • 98.8.5 Other Types
    • 98.9 Your Turn
  • 99 Basic Syntax
    • 99.1 Heading Text
    • 99.2 Plain Text
    • 99.3 Bold and Italicized Text
    • 99.4 Lists
      • 99.4.1 Unordered Lists
      • 99.4.2 Ordered Lists
    • 99.5 Link to a Section
    • 99.6 Hyperlink
    • 99.7 Insert Images
    • 99.8 Tabbed Sections
    • 99.9 Your Turn
    • 99.10 Lesson 4: YAML Headers
      • 99.10.1 Title
      • 99.10.2 Author(s)
      • 99.10.3 Date
    • 99.11 Table of Contents (TOC)
      • 99.11.1 Floating Table of Contents (TOC)
    • 99.12 Themes
    • 99.13 Code Folding
      • 99.13.1 Example without Code Folding
      • 99.13.2 Examples with Code Folding
    • 99.14 output
    • 99.15 Custom Template
    • 99.16 ymlthis
    • 99.17 Your Turn
    • 99.18 Lesson 5: Code Chunks and Inline Code
      • 99.18.1 Code Chunks
      • 99.18.2 Shortcuts
      • 99.18.3 Options
    • 99.19 Inline Code
    • 99.20 Your Turn (Part 1)
    • 99.21 Interactive Features
      • 99.21.1 DT
      • 99.21.2 Plotly
      • 99.21.3 Leaflet
    • 99.22 Your Turn (Part 2)
  • 100 Child Documents
    • 100.1 Extract and Run R-Code from R Markdown Files
      • 100.1.1 R Code
    • 100.2 Your Turn
  • 101 Parameterized Reports
    • 101.1 params
    • 101.2 Knitting
    • 101.3 rmarkdown::render()
    • 101.4 Your Turn
  • XVIII Module 17
  • 102 Special Topics: Machine, Learn
    • 102.1 Module Materials
  • 103 Neural Networks
    • 103.1 What is a Neural Network?
    • 103.2 How does it learn?
      • 103.2.1 Teaching A.I. to Play My Game
      • 103.2.2 Stickman A.I. Learns To Walk
  • 104 Natural Language Processing
  • XIX Module Last
  • Don’t Miss The Last Module
    • 104.1 Important Wake Forest Stuff
    • 104.2 What Next?
      • 104.2.1 Industry Transition Stories
  • XX Workshop
  • Workshop Links
    • 104.3 Course Resources
      • 104.3.1 Individual Slide Decks
    • 104.4 Workshop Activities
    • 104.5 GitHub Repositories
    • 104.6 Documentation and Cheat Sheets
    • 104.7 Data Sources
    • 104.8 Further Reading and Tools
    • 104.9 Miscellaneous and Additional Resources
  • 105 Optional Lab
    • Packages
    • Data collection via web scraping
    • Data cleaning
    • Data visualization and interpretation
  • 106 Lab: Academic Freedom
    • Learning goals
    • Getting started and warming up
      • Packages
      • Exercise 1: Academic Freedom in the United States
      • Exercise 2: Building the Radar Plot
      • Exercise 3: Looking Beyond the U.S.
      • Stretch Task: Where Else is Academic Freedom Declining?
  • XXI Back Matter
  • 107 Good Resources
    • 107.1 Cheatsheets
  • 108 Media without a home yet
    • 108.1 SIPS Resources
    • 108.2 Visualizing Linear Models: An R Bag of Tricks
    • 108.3 For new programmers learning keyboard shortcuts
    • 108.4 Are you a student? If yes, this is the best data science project for you
    • 108.5 rstudio is magic
    • 108.6 automation quote
    • 108.7 How computer memory works
    • 108.8 Is Coding a Math Skill or a Language Skill? Neither? Both?
    • 108.9 Quantum Computers Explained
    • 108.10 The Rise of the Machines – Why Automation is Different this Time
    • 108.11 Emergence – How Stupid Things Become Smart Together
    • 108.12 How not to ask for help
    • 108.13 The Birthday Paradox
    • 108.14 Why can’t you divide by zero?
    • 108.15 Yea he’s chewing up my stats homework but that face though
    • 108.16 Coding Kitty
    • 108.17 Democratic databases: science on GitHub
    • 108.18 Ten simple rules for getting started on Twitter as a scientist
    • 108.19 NYT data ethics stuff
    • 108.20
  • 109 R Commands
  • References
  • License: CC-BY-SA

Data Science for Psychologists

104 Natural Language Processing

Resources:

  • https://www.vox.com/future-perfect/2019/2/14/18222270/artificial-intelligence-open-ai-natural-language-processing
  • https://app.inferkit.com/demo