class: center, middle, inverse, title-slide .title[ # Meet the toolkit
⚒ ] .author[ ### S. Mason Garrison ] --- layout: true <div class="my-footer"> <span> <a href="https://DataScience4Psych.github.io/DataScience4Psych/" target="_blank">Data Science for Psychologists</a> </span> </div> --- class: middle # Reproducible data analysis --- ## Reproducibility checklist .question[ What does it mean for a data analysis to be "reproducible"? ] -- Near-term goals: - Are the tables and figures reproducible from the code and data? - Does the code actually do what you think it does? - In addition to what was done, is it clear **why** it was done? (e.g., how were parameter settings chosen?) Long-term goals: - Can the code be used for other data? - Can you extend the code to do other things? --- ## Toolkit ![toolkit](img/toolkit.png) - Scriptability `\(\rightarrow\)` R - Literate programming (code, narrative, output in one place) `\(\rightarrow\)` R Markdown - Version control `\(\rightarrow\)` Git / GitHub --- class: middle # Toolkit overview --- <img src="img/whole-game-01.png" width="100%" style="display: block; margin: auto;" /> --- <img src="img/whole-game-02.png" width="100%" style="display: block; margin: auto;" /> --- <img src="img/whole-game-03.png" width="100%" style="display: block; margin: auto;" /> --- <img src="img/whole-game-04.png" width="100%" style="display: block; margin: auto;" /> --- class: middle # R and RStudio --- ## What is R/RStudio? - R is a statistical programming language - RStudio is a convenient interface for R (an integrated development environment, IDE) - At its simplest: - R is like a car’s engine - RStudio is like a car’s dashboard <img src="img/engine-dashboard.png" width="66%" style="display: block; margin: auto;" /> --- ## Let's take a tour - R / RStudio .center[ ![](../img/demo.png) ] - Console - Using R as a calculator - Environment - Loading and viewing a data frame - Accessing a variable in a data frame - R functions --- ## Working with R at the command line - Launch RStudio/R. - Notice the default panes: - Console (entire left) - Environment/History (tabbed in upper right) - Files/Plots/Packages/Help (tabbed in lower right) -- - FYI: You can change the default location of the panes, among many other things - [Customizing RStudio](https://support.rstudio.com/hc/en-us/articles/200549016-Customizing-RStudio) --- ## Working with R at the command line (pt 2) - Go into the Console, where we interact with the live R process. - Make an assignment and then inspect the object you just created: ``` r x <- 3 * 4 x ``` ``` ## [1] 12 ``` - All R statements where you create objects -- "assignments" -- have this form: ``` r objectName <- value ``` - Read this as 'x gets 12' <!--- and in my head I hear, e.g., "x gets 12". You will make lots of assignments and the operator `<-` is a pain to type. Don't be lazy and use `=`, although it would work, because it will just sow confusion later. Instead, utilize RStudio's keyboard shortcut: Alt + - (the minus sign). --> --- ## R essentials A short list (for now): - Functions are (most often) verbs, followed by what they will be applied to in parentheses: ``` r do_this(to_this) do_that(to_this, to_that, with_those) ``` -- - Columns (variables) in data frames are accessed with `$`: ``` r dataframe$var_name ``` -- - Packages are installed with the `install.packages` function and loaded with the `library` function, once per session: ``` r install.packages("package_name") library(package_name) ``` --- ## tidyverse .pull-left[ ![](img/tidyverse.png) ] .pull-right[ .center[ [tidyverse.org](https://www.tidyverse.org/) ] - The tidyverse is an opinionated collection of R packages designed for data science. - All packages share an underlying philosophy and a common grammar. ] --- class: middle # R Markdown --- ## R Markdown - Fully reproducible reports -- each time you knit the analysis is ran from the beginning - Simple markdown syntax for text - Code goes in chunks, defined by three backticks, narrative goes outside of chunks --- ## Let's take a tour - R Markdown .center[ ![](../img/demo.png) ] Concepts introduced: - Copying a project of mine - Knitting documents - R Markdown and (some) R syntax --- .your-turn[ - The Bechdel test asks whether a work of fiction features at least two women who talk to each other about something other than a man, and there must be two women named characters. - Go to github page and fork the assignment `Bechdel + R Markdown`. - Open and knit the R Markdown document `bechdel.Rmd` and follow along with the instructions. ] --- class: middle # Wrapping Up... --- ## R Markdown help .pull-left[ .center[ [R Markdown cheat sheet](https://github.com/rstudio/cheatsheets/raw/master/rmarkdown-2.0.pdf) ] ![](img/rmd-cheatsheet.png) ] .pull-right[ .center[ Markdown Quick Reference `Help -> Markdown Quick Reference` ] ![](img/md-cheatsheet.png) ] --- ## Workspaces Remember this, and expect it to bite you a few times as you're learning to work with R Markdown: The workspace of your R Markdown document is separate from the Console! - Run the following in the console ``` r x <- 2 x * 3 ``` .question[ All looks good, eh? ] - Then, add the following chunk in your R Markdown document ``` r x * 3 ``` .question[ What happens? Why the error? ] --- ## How will we use R Markdown? - Every assignment / report / project / etc. is an R Markdown document - You'll have a template R Markdown document to start with - The amount of scaffolding in the template will decrease over the semester --- class: middle # Wrapping Up... R and R Markdown --- class: middle # Git and GitHub --- ## Version control - We introduced GitHub as a platform for collaboration - But it's much more than that... - It's actually designed for version control --- ## Versioning <img src="img/lego-steps.png" width="70%" style="display: block; margin: auto;" /> --- ## Versioning with human readable messages <img src="img/lego-steps-commit-messages.png" width="70%" style="display: block; margin: auto;" /> --- ## Why do we need version control? <img src="img/phd_comics_vc.gif" width="40%" style="display: block; margin: auto;" /> --- # Git and GitHub tips - Git is a version control system -- like “Track Changes” features from Microsoft Word on steroids. GitHub is the home for your Git-based projects on the internet -- like DropBox but much, much better). -- - There are millions of git commands -- ok, that's an exaggeration, but there are a lot of them -- and very few people know them all. 99% of the time you will use git to add, commit, push, and pull. -- - We will be doing Git things and interfacing with GitHub through RStudio, but if you google for help you might come across methods for doing these things in the command line -- skip that and move on to the next resource unless you feel comfortable trying it out. -- - There is a great resource for working with git and R: [happygitwithr.com](http://happygitwithr.com/). Some of the content in there is beyond the scope of this course, but it's a good place to look for help. --- ## Let's take a tour - Git and GitHub .center[ ![](../img/demo.png) ] - Connect an R project to GitHub repository - Working with a local and remote repository - Making a change locally, committing, and pushing - Making a change on GitHub and pulling - There is a bit more to GitHub that we'll use in this class, but for today this is enough --- class: middle # Wrapping Up... Git and GitHub --- class: middle # Getting help in R --- ## Reading help files <img src="img/r-help.png" width="50%" style="display: block; margin: auto;" /> --- ## Asking good questions .pull-left[ - **Good:** Describe your intention and include your code and the error - **Better:** Describe your intention and create a minimum working example - **Best:** Write a **rep**roducible **ex**ample (reprex) -- we'll introduce this concept more formally and teach you the tools for it a little later in the semester ] -- .pull-right[ ![](https://media.giphy.com/media/uRb2p09vY8lEs/giphy.gif) - Use code formatting - For issues with R code: copy / paste your code and resulting error, don't use screenshots! ] --- # Sources - Mine Çetinkaya-Rundel's Data Science in a Box ([link](https://datasciencebox.org/)) - Kieran Healy's [Data Visualization: A practical introduction](http://socviz.co/appendix.html#a-little-more-about-r) - [Jenny Bryan's Stat545](https://stat545.com) - [Modern Dive](https://moderndive.com/) --- class: middle # Wrapping Up... Getting help in R