In this mini analysis, we’ll work with the data used in the FiveThirtyEight story titled “The Dollar-And-Cents Case Against Hollywood’s Exclusion of Women”.

Data and packages

We start by loading the packages we’ll use.

library(fivethirtyeight)
library(tidyverse)

The dataset contains information on 1794 movies released between 1970 and 2013. However we’ll focus our analysis on movies released between 1990 and 2013.

bechdel90_13 <- bechdel %>%
  filter(between(year, 1990, 2013))

There are —- such movies.

The financial variables we’ll focus on are the following:

And we’ll also use the binary and test_clean variables for grouping.

Analysis

Let’s take a look at how median budget and gross vary by whether the movie passed the Bechdel test.

bechdel90_13 %>%
  group_by(binary) %>%
  summarise(
    med_budget = median(budget_2013),
    med_domgross = median(domgross_2013, na.rm = TRUE),
    med_intgross = median(intgross_2013, na.rm = TRUE)
  )
## # A tibble: 2 × 4
##   binary med_budget med_domgross med_intgross
##   <chr>       <dbl>        <dbl>        <dbl>
## 1 FAIL    48385984.    57318606.    104475669
## 2 PASS    31070724     45330446.     80124349

Next, let’s take a look at how median budget and gross vary by a more detailed indicator of the Bechdel test result (ok = passes test, dubious, men = women only talk about men, notalk = women don’t talk to each other, nowomen = fewer than two women).

bechdel90_13 %>%
  # ____ %>%
  summarise(
    med_budget = median(budget_2013),
    med_domgross = median(domgross_2013, na.rm = TRUE),
    med_intgross = median(intgross_2013, na.rm = TRUE)
  )
## # A tibble: 1 × 3
##   med_budget med_domgross med_intgross
##        <int>        <dbl>        <dbl>
## 1   37878971     52270207     93523336

In order to evaluate how return on investment varies among movies that pass and fail the Bechdel test, we’ll first create a new variable called roi as the ratio of the gross to budget.

bechdel90_13 <- bechdel90_13 %>%
  mutate(roi = intgross_2013 / domgross_2013)

Let’s see which movies have the highest return on investment.

bechdel90_13 %>%
  arrange(desc(roi)) %>%
  select(title, clean_test, binary, roi, budget_2013, intgross_2013)
## # A tibble: 1,615 × 6
##    title                      clean_test binary    roi budget_2013 intgross_2013
##    <chr>                      <ord>      <chr>   <dbl>       <int>         <dbl>
##  1 Tropa de Elite             ok         PASS   1638.      7345604      16088238
##  2 St. Trinian's              ok         PASS   1496.     12808396      25219695
##  3 Jin ling shi san chai      ok         PASS    301.    103569079      97066426
##  4 Chinjeolhan geumjassi      ok         PASS    111.      5368649      28002720
##  5 Che: Part One              notalk     FAIL    103.     62770866      32595998
##  6 Shaolin Soccer             nowomen    FAIL     87.5    13158460      56286669
##  7 Mononoke-hime              ok         PASS     63.3    29024763     218193652
##  8 Agora                      notalk     FAIL     62.9    76001212      42335163
##  9 Perfume: The Story of a M… ok         PASS     60.1    73624227     154418394
## 10 Centurion                  notalk     FAIL     53.6    16023478       7075508
## # ℹ 1,605 more rows

Below is a visualization of the return on investment by test result, however it’s difficult to see the distributions due to a few extreme observations.

ggplot(data = bechdel90_13, mapping = aes(x = clean_test, y = roi, color = binary)) +
  geom_boxplot() +
  labs(
    title = "Return on investment vs. Bechdel test result",
    x = "Detailed Bechdel result",
    y = "___",
    color = "Binary Bechdel result"
  )

Zooming in on the movies with roi < 10 provides a better view of how the medians across the categories compare:

ggplot(data = bechdel90_13, mapping = aes(x = clean_test, y = roi, color = binary)) +
  geom_boxplot() +
  ylim(0, 10) +
  labs(
    title = "Return on investment vs. Bechdel test result",
    subtitle = "___",
    x = "Detailed Bechdel result",
    y = "Return on investment",
    color = "Binary Bechdel result"
  )

References

  1. Assignment Adapted from Mine Cetinkaya-Rundel’s Data Science in a Box