73 Lab: Modeling professor attractiveness and course evaluations

Why are hot professors “better” teachers?

At the end of most college courses, students are asked to evaluate the class and the instructor—usually anonymously, often hastily, sometimes with one hand already on the doorknob. These are often used to assess instructor effectiveness, allocate merit raises, and sometimes even decide whether people keep their jobs.

But are course evaluations actually measuring teaching quality? Or are they picking up on other things—like a professor’s appearance?

In a now-classic economics paper, Daniel Hamermesh and Amy Parker looked at whether professors who are considered more physically attractive get higher evaluation scores. The short answer? Yeah, they do. You can read their study here.⁴ The dataset we’ll use comes from a slightly modified version of the replication data included with Data Analysis Using Regression and Multilevel/Hierarchical Models by Gelman and Hill.

In this lab, you’ll explore that dataset—focusing on one predictor at a time—to get a feel for how linear models behave, how to interpret them, and how to visualize their results. Along the way, you’ll also get a preview of just how messy “evaluation” can be when the outcome depends on variables that have nothing to do with teaching.

Packages

We’ll use tidyverse, openintro, and broom to wrangle, model, and tidy up our regression output.

library(tidyverse)
library(broom)
library(openintro)

The data

The dataset we’ll be using is called evals from the openintro package. Take a peek at the codebook with ?evals.

Exercises

Part 1: Exploratory Data Analysis

Visualize the distribution of score. Is the distribution skewed? What does that tell you about how students rate courses? Is this what you expected to see? Why, or why not? Include any summary statistics and visualizations you use in your response.
Create a scatterplot of score versus bty_avg (a professor’s average beauty rating). Describe any pattern you observe—does there appear to be a trend, clustering, or wide variation? Don’t overthink it; just describe what you see.

Hint: See the help page for the function at http://ggplot2.tidyverse.org/reference/index.html.

Recreate your scatterplot from Exercise 2, but use geom_jitter() instead of geom_point(). What does jittering do, and why might it improve the plot? Was anything misleading or hidden in the original version?

Part 2: Linear regression with a numerical predictor

Recall: Linear model is in the form \(\hat{y} = b_0 + b_1 x\).

Let’s see if the apparent trend in the plot is something more than natural variation. Fit a linear model called m_bty to predict average professor evaluation score by average beauty rating (bty_avg). Based on the regression output, write the linear model.
Replot your visualization from Exercise 3, this time add a regression line in orange. Turn off the default shading around the line. By default, the plot includes shading around the line—what does that shading represent? And speculate why I’m asking you to turn it off.
What does the slope of the model tell you? Interpret it in the context of this data—what does it say about how evaluation scores change with beauty ratings?
What does the intercept represent in this model? Is it meaningful in this context, or just a mathematical artifact? Explain your reasoning.
What is the \(R^2\) value of this model? Interpret it in context: how much of the variation in evaluation scores is explained by beauty ratings?

Part 3: Linear regression with a categorical predictor

Let’s switch gears from numeric predictors to categorical ones. Beauty scores might be (somewhat) continuous, but characteristics like gender and rank are categorical, meaning they fall into distinct groups.

We’ll start by seeing whether evaluation scores differ by gender.

m_gen <- lm(score ~ gender, data = evals)
tidy(m_gen)
#> # A tibble: 2 × 5
#>   term        estimate std.error statistic p.value
#>   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
#> 1 (Intercept)    4.09     0.0387    106.   0      
#> 2 gendermale     0.142    0.0508      2.78 0.00558

Take a look at the model output. What’s the reference level? What do the coefficients tell you about how evaluation scores differ between male and female professors?
What is the equation of the line corresponding to male professors? What is it for female professors?
Fit a new linear model called m_rank to predict average professor evaluation score based on rank of the professor. Based on the regression output, write the linear model and interpret the slopes and intercept in context of the data.
Create a new variable called rank_relevel where "tenure track" is the baseline level.
Fit a new linear model called m_rank_relevel to predict average professor evaluation score based on rank_relevel of the professor. This is the new (releveled) variable you created in the previous exercise. Based on the regression output, write the linear model and interpret the slopes and intercept in context of the data. Also determine and interpret the \(R^2\) of the model.
Create another new variable called tenure_eligible that labels "teaching" faculty as "no" and labels "tenure track" and "tenured" faculty as "yes".
Fit a new linear model called m_tenure_eligible to predict average professor evaluation score based on tenure_eligibleness of the professor. This is the new (regrouped) variable you created in Exercise 15. Based on the regression output, write the linear model and interpret the slopes and intercept in context of the data. Also determine and interpret the \(R^2\) of the model.

Daniel S. Hamermesh, Amy Parker, Beauty in the classroom: instructors pulchritude and putative pedagogical productivity, Economics of Education Review, Volume 24, Issue 4, August 2005, Pages 369-376, ISSN 0272-7757, 10.1016/j.econedurev.2004.07.013.↩︎