class: center, middle, inverse, title-slide .title[ # Welcome to Data Science
for Psychologists
🐱 ] .author[ ### S. Mason Garrison
Assistant Professor
Wake Forest University ] --- layout: true <div class="my-footer"> <span> <a href="https://DataScience4Psych.github.io/DataScience4Psych/" target="_blank">Data Science for Psychologists</a> </span> </div> --- class: center, middle # Data Science for Psychologists Workshop ## APS Annual Convention ### Mason Garrison #### Assistant Professor of Quantitative Psychology #### Wake Forest University #### wifi: psychscience23 --- class: middle # A Little About Me --- # Who Am I? .pull-left[ - My name is S. Mason Garrison - I'm an Assistant Professor of Quantitative Psychology at - Wake Forest University - I'm a Quantitative Psychologist and - Behavior Geneticist by training. ] .center.pull-right[ <img src="img/greenmason.jpg" width="90%" style="display: block; margin: auto;" /> ] --- ## My Work <!-- rewrite --> - My substantive work focuses on examining the relationships between socioeconomic status, individual differences, and health. - I use genetically-informed designs to approach these questions, leveraging cousins, siblings, and other kin. - I rely heavily on pre-existing data sets, working with tons data, quite literally in some cases. - My current work is heavily centered around data modeling for all people of Utah and their ancestors. - We've (Mike Hunter & Alex Burt) developed some novel pedigree models to comparing cousins up to the 11th degree. - At the moment, we're looking at outcomes that the National Institute on Aging (NIA) is interested in.... --- ## Why I'm Here Today <!-- rewrite -->.large[ - I'm passionate about sharing my love for R and data science with others, especially those within the field of psychology. - I believe that the tools and methodologies of data science can - meaningfully enhance our ability to conduct efficient and impactful research in psychology - I'm excited to be here and share this course with you! ] --- ## Fun Facts About Me .pull-left.large[ - I'm a cat lover! (as you might have guessed from my use of 🐱) - I enjoy automating things and *pretty active* on wikipedia. - I'm a chai tea enthusiast! ] -- .pull-right[ <img src="img/wiki.png" width="100%" style="display: block; margin: auto;" /> ] --- class: middle # Workshop Overview! --- # Workshop Disclaimer .pull-left[ - This workshop is adapted from my semester-long Data Science for Psychologists class. - That class is fully-flipped, so you're going to experience the live version. - However, I definitely encourage you to check out the full thing. ] .pull-right[ <img src="img/rbook.png" width="95%" style="display: block; margin: auto;" /> ] <img src="d01_02_welcome_toolkit_files/figure-html/unnamed-chunk-5-1.png" width="10%" style="display: block; margin: auto;" /> .center[ [DataScience4Psych.github.io/DataScience4Psych/](https://DataScience4Psych.github.io/DataScience4Psych/) ] --- ```r library(qrcode) code <- qr_code("https://DataScience4Psych.github.io/DataScience4Psych/") plot(code) ``` <img src="d01_02_welcome_toolkit_files/figure-html/qr-1.png" width="60%" style="display: block; margin: auto;" /> --- # Workshop Overview - .large[<i class="fa fa-database fa"></i> Welcome to R and the tidyverse] -- - .large[<i class="fa fa-code fa"></i> Data and Visualization] -- - .large[<i class="fa fa-flask fa"></i> Data Wrangling] -- <br><br><br><br><br><br><br> .small[ .center[ [DataScience4Psych.github.io/DataScience4Psych/](https://DataScience4Psych.github.io/DataScience4Psych/) ] ] --- # Introduction to R and the tidyverse .pull-left[ - .large[Getting started with R and RStudio] - .large[Understanding the structure and philosophy of the tidyverse] - .large[Learning basic R syntax and operations] ] --- # Data Visualization .pull-left[ - .large[Creating static and interactive visualizations using ggplot2] - .large[Customizing visual elements to enhance your plots] - .large[Interpreting and describing visualizations] ] .center.pull-right[ <img src="img/plot007w.png" width="95%" style="display: block; margin: auto auto auto 0;" /> ] --- # Data Wrangling .pull-left[ - .large[Importing and exporting data in various formats] - .large[Cleaning, transforming, and reshaping data] ] --- --- class: middle # Hello world! --- ## What is data science? - <i class="fa fa-database fa"></i> + <i class="fa fa-flask fa"></i> = data science? -- - <i class="fa fa-database fa"></i> + <i class="fa fa-code fa"></i> = data science? -- - <i class="fa fa-database fa"></i> + <i class="fa fa-user fa"></i> + <i class="fa fa-code fa"></i> = data science? -- - <i class="fa fa-database fa"></i> + <i class="fa fa-users fa"></i> + <i class="fa fa-code fa"></i> = data science? -- <br> <br> .large[ Data science is an exciting discipline that allows you to turn raw data into understanding, insight, and knowledge. We're going to learn to do this in a `tidy` way -- more on that later! ] --- ## Where is this course? <br> .center[ <img src="d01_02_welcome_toolkit_files/figure-html/unnamed-chunk-7-1.png" width="20%" style="display: block; margin: auto;" /> <br> Notes: [DataScience4Psych.github.io/DataScience4Psych/](https://DataScience4Psych.github.io/DataScience4Psych/) ] <br> .center[ <img src="d01_02_welcome_toolkit_files/figure-html/slides-1.png" width="20%" style="display: block; margin: auto;" /> <br> Slides: [datascience4psych.github.io/slides/](https://datascience4psych.github.io/slides/) ] --- class: center, middle # Meet the toolkit ## ⚒ <br> .center[ <img src="d01_02_welcome_toolkit_files/figure-html/unnamed-chunk-8-1.png" width="20%" style="display: block; margin: auto;" /> <br> Slide: [datascience4psych.github.io/slides/workshop/d01_02_welcome_toolkit.html](https://datascience4psych.github.io/slides/workshop/d01_02_welcome_toolkit.html) ] --- class: middle # Reproducible data analysis --- ## Reproducibility checklist .question[ What does it mean for a data analysis to be "reproducible"? ] -- Near-term goals: - Are the tables and figures reproducible from the code and data? - Does the code actually do what you think it does? - In addition to what was done, is it clear **why** it was done? (e.g., how were parameter settings chosen?) -- Long-term goals: - Can the code be used for other data? - Can you extend the code to do other things? --- ## Toolkit ![toolkit](img/toolkit.png) - Scriptability `\(\rightarrow\)` R - Literate programming (code, narrative, output in one place) `\(\rightarrow\)` R Markdown - Version control `\(\rightarrow\)` Git / GitHub --- class: middle # Toolkit overview --- <img src="img/whole-game-01.png" width="100%" style="display: block; margin: auto;" /> --- <img src="img/whole-game-02.png" width="100%" style="display: block; margin: auto;" /> --- <img src="img/whole-game-03.png" width="100%" style="display: block; margin: auto;" /> --- <img src="img/whole-game-04.png" width="100%" style="display: block; margin: auto;" /> --- class: middle # R and RStudio --- ## What is R/RStudio? - R is a statistical programming language - RStudio is a convenient interface for R (an integrated development environment, IDE) - At its simplest: - R is like a car's engine - RStudio is like a car’s dashboard <img src="img/engine-dashboard.png" width="66%" style="display: block; margin: auto;" /> --- ## Let's take a tour - R / RStudio .center[ ![](img/demo.png) ] - Console - Using R as a calculator - Environment - Loading and viewing a data frame - Accessing a variable in a data frame - R functions --- ## Tour: Working with R at the command line - Launch RStudio/R. - Notice the default panes: - Console (entire left) - Environment/History (tabbed in upper right) - Files/Plots/Packages/Help (tabbed in lower right) -- - FYI: You can change the default location of the panes, among many other things - [Customizing RStudio](https://support.rstudio.com/hc/en-us/articles/200549016-Customizing-RStudio) --- ## Tour: Working with R at the command line (pt 2) - Go into the Console, where we interact with the live R process. - Make an assignment and then inspect the object you just created: ```r x <- 3 * 4 x ``` ``` ## [1] 12 ``` - All R statements where you create objects -- "assignments" -- have this form: ```r objectName <- value ``` - Read this as 'x gets 12' .footnote[and in my head I hear, e.g., "x gets 12". You will make lots of assignments and the operator `<-` is a pain to type. Don't be lazy and use `=`, although it would work, because it will just sow confusion later. Instead, utilize RStudio's keyboard shortcut: Alt + - (the minus sign).] --- ## R essentials A short list (for now): - Functions are (most often) verbs, followed by what they will be applied to in parentheses: ```r do_this(to_this) do_that(to_this, to_that, with_those) ``` -- - Columns (variables) in data frames are accessed with `$`: ```r dataframe$var_name ``` -- - Packages are installed with the `install.packages` function and loaded with the `library` function, once per session: ```r install.packages("package_name") library(package_name) ``` --- ## tidyverse .pull-left[ .center[ <img src="img/hexs.png" width="50%" style="display: block; margin: auto;" /> ] ] .pull-right[ .center[ [tidyverse.org](https://www.tidyverse.org/)] - The tidyverse is an opinionated collection of R packages designed for data science. - All packages share an underlying philosophy and common grammar: the "tidy data" principles. - "Tidy data" has a specific structure: each variable is a column, each observation is a row, and each type of observational unit is a table. ] --- ## tidyverse .pull-left[ .center[ <img src="img/hexs.png" width="50%" style="display: block; margin: auto;" /> ] ] .pull-right[ .center[ [tidyverse.org](https://www.tidyverse.org/)] - Key packages include: - **ggplot2** for data visualization. - **dplyr** for data manipulation. - **tidyr** for data tidying. - **readr** for data import. - **purrr** for functional programming. - **tibble** for tibble, a modern reimagining of data frames. - **stringr** for string manipulation. - **forcats** for handling categorical variables. - These packages are designed to work together seamlessly. ] --- class: middle # R Markdown --- ## R Markdown - .large[Fully reproducible reports -- each time you knit the analysis is run from the beginning] - .large[Simple markdown syntax for text] - .large[Code goes in chunks, defined by three backticks, while narrative goes outside of chunks] --- ## Let's take a tour - R Markdown .center[ ![](img/demo.png) ] Concepts introduced: - Copying a project - Knitting documents - R Markdown and (some) R syntax --- .your-turn[ - The Bechdel test asks whether a work of fiction features at least two women who talk to each other about something other than a man, and there must be two women named characters. - Go to github page and download the assignment `Bechdel + R Markdown`. - Open and knit the R Markdown document `bechdel.Rmd` and follow along with the instructions. - https://github.com/DataScience4Psych/ae-02-bechdel-rmarkdown ] .center[ <img src="d01_02_welcome_toolkit_files/figure-html/unnamed-chunk-20-1.png" width="20%" style="display: block; margin: auto;" /> ] --- class: middle # Wrapping Up... --- ## R Markdown help .pull-left[ .center[ [R Markdown cheat sheet](https://github.com/rstudio/cheatsheets/raw/master/rmarkdown-2.0.pdf) ] .center[ <img src="img/rmd-cheatsheet.png" width="70%" style="display: block; margin: auto;" /> ] ] .pull-right[ .center[ Markdown Quick Reference `Help -> Markdown Quick Reference` ] .center[ <img src="img/md-cheatsheet.png" width="90%" style="display: block; margin: auto;" /> ] ] --- ## Workspaces Remember this, and expect it to bite you a few times as you're learning to work with R Markdown: The workspace of your R Markdown document is separate from the Console! - Run the following in the console ```r x <- 2 x * 3 ``` .question[ All looks good, eh? ] - Then, add the following chunk in your R Markdown document ```r x * 3 ``` -- .question[ What happens? Why the error? ] --- class: middle # Wrapping Up... R and R Markdown --- class: middle # Getting help in R --- ## Reading help files <img src="img/r-help.png" width="50%" style="display: block; margin: auto;" /> --- ## Asking good questions .pull-left[ - **Good:** Describe your intention and include your code and the error - **Better:** Describe your intention and create a minimum working example - **Best:** Write a **rep**roducible **ex**ample (reprex) -- we'll introduce this concept more formally and teach you the tools for it a little later in the semester ] -- .pull-right[ ![](https://media.giphy.com/media/uRb2p09vY8lEs/giphy.gif) - Use code formatting - For issues with R code: copy / paste your code and resulting error, don't use screenshots! ] --- # Sources - Mine Çetinkaya-Rundel's Data Science in a Box ([link](https://datasciencebox.org/)) - Kieran Healy's [Data Visualization: A practical introduction](http://socviz.co/appendix.html#a-little-more-about-r) - [Jenny Bryan's Stat545](https://stat545.com) - [Modern Dive](https://moderndive.com/) --- class: middle # Wrapping Up...