73 Lab: Modeling professor attractiveness and course evaluations
Why are hot professors “better” teachers?
At the end of most college courses, students are asked to evaluate the class and the instructor—usually anonymously, often hastily, sometimes with one hand already on the doorknob. These are often used to assess instructor effectiveness, allocate merit raises, and sometimes even decide whether people keep their jobs.
But are course evaluations actually measuring teaching quality? Or are they picking up on other things—like a professor’s appearance?
In a now-classic economics paper, Daniel Hamermesh and Amy Parker looked at whether professors who are considered more physically attractive get higher evaluation scores. The short answer? Yeah, they do. You can read their study here.4 The dataset we’ll use comes from a slightly modified version of the replication data included with Data Analysis Using Regression and Multilevel/Hierarchical Models by Gelman and Hill.
In this lab, you’ll explore that dataset—focusing on one predictor at a time—to get a feel for how linear models behave, how to interpret them, and how to visualize their results. Along the way, you’ll also get a preview of just how messy “evaluation” can be when the outcome depends on variables that have nothing to do with teaching.
The data
The dataset we’ll be using is called evals from the openintro package. Take a peek at the codebook with ?evals.
Exercises
Part 1: Exploratory Data Analysis
Visualize the distribution of
score. Is the distribution skewed? What does that tell you about how students rate courses? Is this what you expected to see? Why, or why not? Include any summary statistics and visualizations you use in your response.Create a scatterplot of
scoreversusbty_avg(a professor’s average beauty rating). Describe any pattern you observe—does there appear to be a trend, clustering, or wide variation? Don’t overthink it; just describe what you see.
Hint: See the help page for the function at http://ggplot2.tidyverse.org/reference/index.html.
- Recreate your scatterplot from Exercise 2, but use
geom_jitter()instead ofgeom_point(). What does jittering do, and why might it improve the plot? Was anything misleading or hidden in the original version?
Part 2: Linear regression with a numerical predictor
Recall: Linear model is in the form \(\hat{y} = b_0 + b_1 x\).
Let’s see if the apparent trend in the plot is something more than natural variation. Fit a linear model called
m_btyto predict average professor evaluationscoreby average beauty rating (bty_avg). Based on the regression output, write the linear model.Replot your visualization from Exercise 3, this time add a regression line in orange. Turn off the default shading around the line. By default, the plot includes shading around the line—what does that shading represent? And speculate why I’m asking you to turn it off.
What does the slope of the model tell you? Interpret it in the context of this data—what does it say about how evaluation scores change with beauty ratings?
What does the intercept represent in this model? Is it meaningful in this context, or just a mathematical artifact? Explain your reasoning.
What is the \(R^2\) value of this model? Interpret it in context: how much of the variation in evaluation scores is explained by beauty ratings?
Part 3: Linear regression with a categorical predictor
Let’s switch gears from numeric predictors to categorical ones. Beauty scores might be (somewhat) continuous, but characteristics like gender and rank are categorical, meaning they fall into distinct groups.
We’ll start by seeing whether evaluation scores differ by gender.
m_gen <- lm(score ~ gender, data = evals)
tidy(m_gen)
#> # A tibble: 2 × 5
#> term estimate std.error statistic p.value
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 (Intercept) 4.09 0.0387 106. 0
#> 2 gendermale 0.142 0.0508 2.78 0.00558Take a look at the model output. What’s the reference level? What do the coefficients tell you about how evaluation scores differ between male and female professors?
What is the equation of the line corresponding to male professors? What is it for female professors?
Fit a new linear model called
m_rankto predict average professor evaluationscorebased onrankof the professor. Based on the regression output, write the linear model and interpret the slopes and intercept in context of the data.Create a new variable called
rank_relevelwhere"tenure track"is the baseline level.Fit a new linear model called
m_rank_relevelto predict average professor evaluationscorebased onrank_relevelof the professor. This is the new (releveled) variable you created in the previous exercise. Based on the regression output, write the linear model and interpret the slopes and intercept in context of the data. Also determine and interpret the \(R^2\) of the model.Create another new variable called
tenure_eligiblethat labels"teaching"faculty as"no"and labels"tenure track"and"tenured"faculty as"yes".Fit a new linear model called
m_tenure_eligibleto predict average professor evaluationscorebased ontenure_eligibleness of the professor. This is the new (regrouped) variable you created in Exercise 15. Based on the regression output, write the linear model and interpret the slopes and intercept in context of the data. Also determine and interpret the \(R^2\) of the model.
Part 4: Multiple linear regression
Fit a linear model (one you have fit before):
m_bty, predicting average professor evaluationscorebased on average beauty rating (bty_avg) only. Write the linear model, and note the \(R^2\) and the adjusted \(R^2\).Fit a linear model (one you have fit before):
m_bty_gen, predicting average professor evaluationscorebased on average beauty rating (bty_avg) andgender. Write the linear model, and note the \(R^2\) and the adjusted \(R^2\).Interpret the slope and intercept of
m_bty_genin context of the data.What percent of the variability in
scoreis explained by the modelm_bty_gen.What is the equation of the line corresponding to just male professors?
For two professors who received the same beauty rating, which gender tends to have the higher course evaluation score?
How does the relationship between beauty and evaluation score vary between male and female professors?
How do the adjusted \(R^2\) values of
m_bty_genandm_btycompare? What does this tell us about how usefulgenderis in explaining the variability in evaluation scores when we already have information on the beauty score of the professor.Compare the slopes of
bty_avgunder the two models (m_btyandm_bty_gen). Has the addition ofgenderto the model changed the parameter estimate (slope) forbty_avg?Create a new model called
m_bty_rankwithgenderremoved andrankadded in. Write the equation of the linear model and interpret the slopes and intercept in context of the data.
Part 3: The search for the best model
Going forward, only consider the following variables as potential predictors: rank, ethnicity, gender, language, age, cls_perc_eval, cls_did_eval, cls_students, cls_level, cls_profs, cls_credits, bty_avg.
Which variable, on its own, would you expect to be the worst predictor of evaluation scores? Why? Hint: Think about which variable would you expect to not have any association with the professor’s score.
Check your suspicions from the previous exercise. Include the model output for that variable in your response.
Suppose you wanted to fit a full model with the variables listed above. If you are already going to include
cls_perc_evalandcls_students, which variable should you not include as an additional predictor? Why?Fit a full model with all predictors listed above (except for the one you decided to exclude) in the previous question.
Using backward-selection with adjusted R-squared as the selection criterion, determine the best model. You do not need to show all steps in your answer, just the output for the final model. Also, write out the linear model for predicting score based on the final model you settle on.
Interpret the slopes of one numerical and one categorical predictor based on your final model.
Based on your final model, describe the characteristics of a professor and course at University of Texas at Austin that would be associated with a high evaluation score.
Would you be comfortable generalizing your conclusions to apply to professors generally (at any university)? Why or why not?
Daniel S. Hamermesh, Amy Parker, Beauty in the classroom: instructors pulchritude and putative pedagogical productivity, Economics of Education Review, Volume 24, Issue 4, August 2005, Pages 369-376, ISSN 0272-7757, 10.1016/j.econedurev.2004.07.013.↩︎