class: center, middle, inverse, title-slide .title[ # Welcome to Data Science
for Psychologists
🐱 ] .author[ ### S. Mason Garrison
Assistant Professor
Wake Forest University ] --- class: center, middle # Data Science in R Workshop --- class: middle # Hour 1: Introduction to R and Data Manipulation with base R and the tidyverse --- ## Introduction to R and Data Science --- layout: true <div class="my-footer"> <span> <a href="https://DataScience4Psych.github.io/DataScience4Psych/" target="_blank">Data Science for Psychologists</a> </span> </div> --- <style type="text/css"> .bg_orange { position: relative; z-index: 1; } .bg_orange::before { content: ""; background-image: url("../img/orangebook.png"); background-size: contain; position: absolute; top: 0px; right: 0px; bottom: 0px; left: 0px; opacity: 0.10; z-index: -1; } </style> --- class: middle # Hello world! --- ## What is data science? - <i class="fa fa-database fa"></i> + <i class="fa fa-flask fa"></i> = data science? -- - <i class="fa fa-database fa"></i> + <i class="fa fa-code fa"></i> = data science? -- - <i class="fa fa-database fa"></i> + <i class="fa fa-user fa"></i> + <i class="fa fa-code fa"></i> = data science? -- - <i class="fa fa-database fa"></i> + <i class="fa fa-users fa"></i> + <i class="fa fa-code fa"></i> = data science? -- <br> <br> .large[ Data science is an exciting discipline that allows you to turn raw data into understanding, insight, and knowledge. We're going to learn to do this in a `tidy` way -- more on that later! ] --- # What is this course? This course is an introduction to data science that is designed for psychologists. It emphasizes statistical thinking and best practices. <br><br> -- **Q - What data science background does this course assume?** A - None. <br> -- **Q - Is this an intro CS course?** A - Although statistics and computer science `\(\ne\)` data science, they are very closely related and have tremendous of overlap. Hence, this course is a great way to get comfortable with those topics. However this course is **not** your typical course. <br> -- **Q - Will we be doing computing?** A - Yes. <br> -- **Q - What computing language will we learn?** A - R. <br> -- **Q: Why not language X?** A: We can discuss that *remotely* over ☕. --- ## Where is this course? <br><br><br><br><br><br><br> .large[ .center[ [DataScience4Psych.github.io/DataScience4Psych/](https://DataScience4Psych.github.io/DataScience4Psych/) ] ] --- class: middle # Wrapping up... Hello world! --- class: middle # Data in the wild --- # The US of Bey <img src="img/brooke-watson-us-of-bey.gif" width="55%" style="display: block; margin: auto;" /> .footnote[Brooke Watson, [blog.brooke.science/posts/the-us-of-bey](https://blog.brooke.science/posts/the-us-of-bey/) ] --- # Punctuation in literature <img src="img/julia-silge-punctuation.png" width="45%" style="display: block; margin: auto;" /> .footnote[ Julia Silge, [juliasilge.com/blog/punctution-literature](https://juliasilge.com/blog/punctution-literature/)] --- # Text analysis of Trump's tweets <img src="img/david-robinson-trump-tweets.png" width="61%" style="display: block; margin: auto;" /> .footnote[David Robinson, [varianceexplained.org/r/trump-tweets](http://varianceexplained.org/r/trump-tweets) ] --- ## Greatest Twitter scheme of all time <img src="img/bohemian-rhapsody.png" width="38%" style="display: block; margin: auto;" /> .footnote[ [gist.github.com/mine-cetinkaya-rundel/03d7516dea1e5f2613a5d71c28edb08d](https://gist.github.com/mine-cetinkaya-rundel/03d7516dea1e5f2613a5d71c28edb08d) ] --- ## Voting patterns in the UN [minecr.shinyapps.io/unvotes](https://minecr.shinyapps.io/unvotes/) <img src="img/unvotes.png" width="100%" style="display: block; margin: auto;" /> --- class: middle # Wrapping up... Data in the wild --- class: middle, bg_orange # Course structure and policies --- ## Logistics .pull-left-narrow[ .xlarge[ + Course <br><br> + Professor <br><br> + Assistants <br><br> ] ] .pull-right-wide[ .white[.] <br> {{content}} ] -- + PSY 703: Data Science for Psychologists + Blended Synchronous <br><br> {{content}} -- + S. Mason Garrison + Green 438/Zoom + Office Hours [calendly.com/smasongarrison/](http://www.calendly.com/smasongarrison/) <br><br> {{content}} -- + Tukey 🐱 + Archie 😺 + Annie 😾 {{content}} --- ## Structure - Interactive - Some lectures, lots of learn-by-doing -- - Meet twice a week - Tuesday are Face-to-Face Tutorials - Thursdays are Solidarity Sessions -- - Bring your laptop! 💻 -- - Flipped Lectures - Pre-recorded Lectures - Module is finalized by the week's start --- # Big Ideas .pull-left[ - Reproducibility; - Replication; - Robust Methods; - Really Nice Visualization; and - R ] .center.pull-right[ <img src="img/plot007w.png" width="95%" style="display: block; margin: auto auto auto 0;" /> ] --- # Categories of Topics .large.pull-left-narrow[ - What - How ] .pull-right-wide[ <img src="img/p0.png" width="60%" height="200%" style="display: block; margin: auto;" /> ] --- # Learning Outcomes - Using R to visualize and model many kinds of data. - Given a dataset, - able to visualize - generate hypotheses - investigate those hypotheses, & - communicate your results. --- # Materials - Textbook - R for Data Science: Import, Tidy, Transform, Visualize, and Model Data - Hadley Wickham & Garrett Grolemund - Online Edition - r4ds.had.co.nz -- - Coursenotes - [datascience4psych.github.io/DataScience4Psych](https://datascience4psych.github.io/DataScience4Psych/) -- - Software - R - Rstudio, - Github --- # Milestones - Labs - Portfolio - Presentation --- # Contract Grading - What is contract grading? - Assessment based on effort - More representative of the scientific process - Specifics are in the syllabus and course notes --- ## Diversity & Inclusion: .midi[ **Intent:** Students from all diverse backgrounds and perspectives be well-served by this course, that students' learning needs be addressed both in and out of class, and that the diversity that the students bring to this class be viewed as a resource, strength and benefit. It is my intent to present materials and activities that are respectful of diversity: gender identity, sexuality, disability, age, socioeconomic status, ethnicity, race, nationality, religion, and culture. Let me know ways to improve the effectiveness of the course for you personally, or for other students or student groups. ] -- .midi[ - If you have a name or set of pronouns that differ from those that appear in your official records, please let me know. - If you feel your performance is being impacted by your experiences outside of class, please don't hesitate to come and talk with me. If you prefer to speak with someone outside of the course, your advisor is an excellent resource. - I (like many people) am still in the process of learning about diverse perspectives/identities. If something was said in class (by anyone) that made you feel uncomfortable, please talk to me about it. ] --- ## How to get help - Course content, logistics, etc. discussion on the course discussion forum. - Please post on the FAQ instead of direct messaging. - Use proper formatting: When asking questions involving code, please make sure to use inline code formatting for short bits of code or code snippets for longer, multi-line chunks. - Often it's a lot more pleasant an experience to get your questions answered in person. Make use of my *remote* office hours, I'm here to help! --- ## Tips for asking questions - First search existing discussion for answers. If the question has already been answered, you're done! If it has already been asked but you're not satisfied with the answer, add to the thread. - Give your question context from course concepts not course assignments. - Good context: "I have a question on filtering" - Bad context: "I have a question on HW 1 question 4" - Be precise in your description: - Good description: "I am getting the following error and I'm not sure how to resolve it - `Error: could not find function "ggplot"`" - Bad description: "R giving errors, help me! Aaaarrrrrgh!” --- ## More Tips for asking questions - You can edit a question after posting it. - Format your questions nicely using markdown and code formatting. - Where appropriate, provide links to specific files, or even lines. - Sharing code will help others understand your question. --- ## Sharing/reusing code - I am well aware that a huge volume of code is available on the web to solve any number of problems. - Unless I explicitly tell you not to use something, you can use any online resources (e.g. StackOverflow, RStudio Community), but you must explicitly cite where you obtained any code you directly use (or use as inspiration). - You are welcome to discuss the problems together and ask for advice, but you may not create code for your classmates. - You won't learn anything if other people write your code for you! --- class: middle # Wrapping Up... --- class: middle # Your Turn! --- ## Voting patterns in the UN <img src="img/unvotes.png" width="100%" style="display: block; margin: auto;" /> [minecr.shinyapps.io/unvotes](https://minecr.shinyapps.io/unvotes/) --- ## Create a GitHub account .small[ .instructions[ Go to [github.com](https://github.com/), and create an account (unless you already have one). ] ] Tips for selecting a username:<sup>✦</sup> .small[ - Incorporate your actual name. - Reuse username from other contexts, e.g., Twitter or Slack. - Pick a username you'll be comfortable revealing to your future boss. - Shorter is better than longer. - Be as unique as possible in as few characters as possible. - Make it timeless. Don't highlight your current university, employer, etc. - Avoid words laden with special meaning in programming, like `NA`. ] -- .small[ <sup>✦</sup> Source: [Happy git with R](http://happygitwithr.com/github-acct.html#username-advice) by Jenny Bryan ] --- .your-turn[ **Option 1.** - On Github, download the assignment called `AE01a - UN Votes`. - In the Files pane in the bottom right corner, spot the file called `unvotes.Rmd`. Open it, and then click on the "Knit" button. - Go back to the file and change your name on top (in the `yaml` -- we'll talk about what this means later) and knit again. - Change the country names to those you're interested in. Your spelling and capitalization should match how the countries appear in the data, so take a peek at the Appendix to confirm spelling. Knit again. Voila, your first data visualization! ] --- .your-turn[ **Option 2.** - On Github, download the assignment called called `AE01b_COVID`. - In the Files pane in the bottom right corner, spot the file called `covid.Rmd`. Open it, and then click on the "Knit" button. - Go back to the file and change your name on top (in the `yaml` -- we'll talk about what this means later) and knit again. - Change the country names to those you're interested in. Your spelling and capitalization should match how the countries appear in the data, so take a peek at the Appendix to confirm spelling. Knit again. Voila, your first data visualization! ] --- class: middle # Wrapping Up... --- ## Data Manipulation with base R ... --- ## Data Wrangling with dplyr ... --- ## Reshaping Data with tidyr ... --- ## Hands-on Exercise: Data Manipulation ... --- class: middle # Hour 2: Data Wrangling and Transformation --- ## Recap of Data Manipulation with the tidyverse ... --- ## Handling Missing Data ... --- ## Data Transformation ... --- ## Handling Categorical Variables ... --- ## Combining and Merging Datasets ... --- ## Case Study and Practical Exercises ... --- class: middle # Hour 3: Data Visualization and Communication --- ## Introduction to Data Visualization ... --- ## Introduction to ggplot2 ... --- ## Creating Basic Visualizations ... --- ## Customizing Plots ... --- ## Creating Advanced Visualizations ... --- ## Interactive Visualizations ... --- ## Best Practices for Data Visualization ... --- ## Hands-on Exercise: Data Visualization ... --- class: middle # Wrapping Up ... ```