How to Use the stat_poly_eq() Function in R


Polynomial regression is a technique we can use when the relationship between a predictor variable and a response variable is nonlinear.

This type of regression takes the form:

Y = β0 + β1X + β2X2 + … + βhXh + ε

where h is  the “degree” of the polynomial.

Often you may want to fit a polynomial regression model in R, plot the regression model, and then display the R-squared value of the model on the plot.

The easiest way to do so is by using the stat_poly_eq() function from the ggmisc package in R, which is designed to perform this exact task.

The stat_poly_eq() function uses the following basic syntax:

The following example shows how to use the stat_poly_eq() function in practice.

Example: How to Use the stat_poly_eq() Function in R

Suppose that we create a dataset that contains the number of hours studied and final exam score for a class of 50 students:

#make this example reproducible
set.seed(1)

#create dataset
df <- data.frame(hours = runif(50, 5, 15), score=50)
df$score = df$score + df$hours^3/150 + df$hours*runif(50, 1, 2)

#view first six rows of data
head(df)

      hours    score
1  7.655087 64.30191
2  8.721239 70.65430
3 10.728534 73.66114
4 14.082078 86.14630
5  7.016819 59.81595
6 13.983897 83.60510

We can use the following syntax to fit a polynomial regression model to this dataset:

#fit polynomial regression model with degree of 2
fit = lm(score ~ poly(hours, 2, raw=T), data=df)

#view summary of model
summary(fit)

Call:
lm(formula = score ~ poly(hours, 2, raw = T), data = df)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.6589 -2.0770 -0.4599  2.5923  4.5122 

Coefficients:
                         Estimate Std. Error t value Pr(>|t|)    
(Intercept)              54.00526    5.52855   9.768 6.78e-13 ***
poly(hours, 2, raw = T)1 -0.07904    1.15413  -0.068  0.94569    
poly(hours, 2, raw = T)2  0.18596    0.05724   3.249  0.00214 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.8 on 47 degrees of freedom
Multiple R-squared:   0.93,	Adjusted R-squared:  0.927 
F-statistic: 312.1 on 2 and 47 DF,  p-value: < 2.2e-16

From the output we can see that the Multiple R-Squared value of the model is 0.93.

Suppose that we would like to create a scatterplot of the data and then display this Multiple R-Squared value on the plot itself so that we can get an idea of how well the data fits the model.

We can use the following syntax with the stat_poly_eq() function to do so:

library(ggplot2)
library(ggpmisc)

#store fitted regression formula
formula <- y ~ poly(x, 2, raw = TRUE)

#create scatterplot with regression formula shown in plot
ggplot(df, aes(hours, score)) +
  geom_point() +
  geom_smooth(method = "lm", formula = formula) +
  stat_poly_eq(formula = formula, parse = TRUE)

This produces the following result:

The x-axis represents the values from the hours column and the y-axis represents the values from the score column of the data frame.

Notice that the Multiple R-Squared value of 0.93 is displayed in the top left corner of the chart. Notice that this matches the value shown in the previous regression output as well.

Note that we also used the geom_smooth() function to plot the fitted regression curve so that we can visually see how well the fitted regression model fits the underlying data.

This function is optional and you don’t have to include it in your own plot but it does give the user a fitted regression curve to compare the Multiple R-Squared value with.

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Use str_split in R
How to Use str_replace in R
How to Count Words in String in R
How to Convert a Vector to String in R

Leave a Reply

Your email address will not be published. Required fields are marked *