class: center, middle, inverse, title-slide .title[ # Visualizing categorical data
🐱 ] .author[ ### S. Mason Garrison ] --- layout: true <div class="my-footer"> <span> <a href="https://DataScience4Psych.github.io/DataScience4Psych/" target="_blank">Data Science for Psychologists</a> </span> </div> --- class: middle # Recap --- ## Variables - **Numerical** variables can be classified as **continuous** or **discrete** based on whether or not the variable can take on an infinite number of values or only non-negative whole numbers, respectively. - If the variable is **categorical**, we can determine if it is **ordinal** based on whether or not the levels have a natural ordering. --- ### Remember this Data? ``` r library(tidyverse) starwars ``` ``` ## # A tibble: 87 × 14 ## name height mass hair_color skin_color eye_color birth_year ## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> ## 1 Luke … 172 77 blond fair blue 19 ## 2 C-3PO 167 75 <NA> gold yellow 112 ## 3 R2-D2 96 32 <NA> white, bl… red 33 ## 4 Darth… 202 136 none white yellow 41.9 ## 5 Leia … 150 49 brown light brown 19 ## 6 Owen … 178 120 brown, gr… light blue 52 ## 7 Beru … 165 75 brown light blue 47 ## 8 R5-D4 97 32 <NA> white, red red NA ## 9 Biggs… 183 84 black light brown 24 ## 10 Obi-W… 182 77 auburn, w… fair blue-gray 57 ## # ℹ 77 more rows ## # ℹ 7 more variables: sex <chr>, gender <chr>, homeworld <chr>, ## # species <chr>, films <list>, vehicles <list>, ## # starships <list> ``` --- ### Perhaps now? ``` r glimpse(starwars) ``` ``` ## Rows: 87 ## Columns: 14 ## $ name <chr> "Luke Skywalker", "C-3PO", "R2-D2", "Darth V… ## $ height <int> 172, 167, 96, 202, 150, 178, 165, 97, 183, 1… ## $ mass <dbl> 77.0, 75.0, 32.0, 136.0, 49.0, 120.0, 75.0, … ## $ hair_color <chr> "blond", NA, NA, "none", "brown", "brown, gr… ## $ skin_color <chr> "fair", "gold", "white, blue", "white", "lig… ## $ eye_color <chr> "blue", "yellow", "red", "yellow", "brown", … ## $ birth_year <dbl> 19.0, 112.0, 33.0, 41.9, 19.0, 52.0, 47.0, N… ## $ sex <chr> "male", "none", "none", "male", "female", "m… ## $ gender <chr> "masculine", "masculine", "masculine", "masc… ## $ homeworld <chr> "Tatooine", "Tatooine", "Naboo", "Tatooine",… ## $ species <chr> "Human", "Droid", "Droid", "Human", "Human",… ## $ films <list> <"A New Hope", "The Empire Strikes Back", "… ## $ vehicles <list> <"Snowspeeder", "Imperial Speeder Bike">, <… ## $ starships <list> <"X-wing", "Imperial shuttle">, <>, <>, "TI… ``` --- #### Recode hair color ``` r starwars <- starwars %>% mutate(hair_color2 = fct_other(hair_color, keep = c("black", "brown", "brown", "blond") ) ) ``` --- class: middle # Bar plot --- ## Bar plot ``` r ggplot(data = starwars, mapping = aes(x = gender)) + geom_bar() ``` <img src="d06_vizcat_files/figure-html/unnamed-chunk-5-1.png" width="55%" style="display: block; margin: auto;" /> --- ## Segmented bar plot: counts ``` r ggplot(data = starwars, mapping = aes(x = gender, * fill = hair_color))+ geom_bar() ``` <img src="d06_vizcat_files/figure-html/unnamed-chunk-6-1.png" width="55%" style="display: block; margin: auto;" /> --- ## Segmented bar plots ``` r ggplot(data = starwars, mapping = aes(x = gender, * fill = hair_color2))+ geom_bar() ``` <img src="d06_vizcat_files/figure-html/unnamed-chunk-7-1.png" width="55%" style="display: block; margin: auto;" /> --- ## Segmented bar plots ``` r ggplot(data = starwars, mapping = aes(x = gender, * fill = hair_color2))+ * geom_bar()+ coord_flip() ``` <img src="d06_vizcat_files/figure-html/unnamed-chunk-8-1.png" width="55%" style="display: block; margin: auto;" /> --- ## Segmented bar plots: proportions ``` r ggplot(data = starwars, mapping = aes(x = gender, fill = hair_color2)) + geom_bar(position = "fill") + coord_flip() ``` <img src="d06_vizcat_files/figure-html/unnamed-chunk-9-1.png" width="45%" style="display: block; margin: auto;" /> ``` r labs(y = "proportion") ``` ``` ## $y ## [1] "proportion" ## ## attr(,"class") ## [1] "labels" ``` --- .question[ Which bar plot is a more useful representation for visualizing the relationship between gender and hair color? ] .pull-left[ <img src="d06_vizcat_files/figure-html/unnamed-chunk-10-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="d06_vizcat_files/figure-html/unnamed-chunk-11-1.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Customizing bar plots .pull-left[ <img src="d06_vizcat_files/figure-html/unnamed-chunk-12-1.png" width="60%" style="display: block; margin: auto;" /> ] .pull-right[.small[ ``` r *ggplot(starwars, aes(y = gender, fill = hair_color2)) + geom_bar(position = "fill") + * labs( * x = "Proportion", * y = "Gender", * fill = "Hair Color", * title = "Hair Colors of Starwars", * subtitle = "by gender" * ) ``` ] --- .your-turn[ Time to actually play around with the Star Wars dataset! - Go to class git repo ([github.com/DataScience4Psych](https://github.com/DataScience4Psych)) and start `AE 03 - StarWars + Data visualization`. - Open the R Markdown document and complete the exercise (and if time allows, the stretch goal exercise).] --- class: middle # Relationships between numerical and categorical variables --- ## Already talked about... - Coloring and faceting histograms and density plots - Side-by-side box plots --- ## Violin plots ``` r ggplot(loans, aes(x = homeownership, y = loan_amount)) + geom_violin() ``` <img src="d06_vizcat_files/figure-html/unnamed-chunk-13-1.png" width="60%" style="display: block; margin: auto;" /> --- ## Ridge plots ``` r library(ggridges) ggplot(loans, aes(x = loan_amount, y = grade, fill = grade, color = grade)) + geom_density_ridges(alpha = 0.5) ``` <img src="d06_vizcat_files/figure-html/unnamed-chunk-14-1.png" width="60%" style="display: block; margin: auto;" /> --- # Sources - Mine Çetinkaya-Rundel's Data Science in a Box ([link](https://datasciencebox.org/)) --- class: middle # Wrapping Up...