Polynomial regression is a technique we can use when the relationship between a predictor variable and a response variable is nonlinear.

This type of regression takes the form:

Y = β_{0} + β_{1}X + β_{2}X^{2} + … + β_{h}X^{h} + ε

where *h* is the “degree” of the polynomial.

Often you may want to fit a polynomial regression model in R, plot the regression model, and then display the R-squared value of the model on the plot.

The easiest way to do so is by using the **stat_poly_eq()** function from the **ggmisc** package in R, which is designed to perform this exact task.

The **stat_poly_eq****()** function uses the following basic syntax:

The following example shows how to use the **stat_poly_eq()** function in practice.

**Example: How to Use the stat_poly_eq() Function in R**

Suppose that we create a dataset that contains the number of hours studied and final exam score for a class of 50 students:

#make this example reproducible set.seed(1) #create dataset df <- data.frame(hours = runif(50, 5, 15), score=50) df$score = df$score + df$hours^3/150 + df$hours*runif(50, 1, 2) #view first six rows of data head(df) hours score 1 7.655087 64.30191 2 8.721239 70.65430 3 10.728534 73.66114 4 14.082078 86.14630 5 7.016819 59.81595 6 13.983897 83.60510

We can use the following syntax to fit a polynomial regression model to this dataset:

#fit polynomial regression model with degree of 2 fit = lm(score ~ poly(hours, 2, raw=T), data=df) #view summary of model summary(fit) Call: lm(formula = score ~ poly(hours, 2, raw = T), data = df) Residuals: Min 1Q Median 3Q Max -5.6589 -2.0770 -0.4599 2.5923 4.5122 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 54.00526 5.52855 9.768 6.78e-13 *** poly(hours, 2, raw = T)1 -0.07904 1.15413 -0.068 0.94569 poly(hours, 2, raw = T)2 0.18596 0.05724 3.249 0.00214 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 2.8 on 47 degrees of freedom Multiple R-squared: 0.93, Adjusted R-squared: 0.927 F-statistic: 312.1 on 2 and 47 DF, p-value: < 2.2e-16

From the output we can see that the Multiple R-Squared value of the model is **0.93**.

Suppose that we would like to create a scatterplot of the data and then display this Multiple R-Squared value on the plot itself so that we can get an idea of how well the data fits the model.

We can use the following syntax with the **stat_poly_eq()** function to do so:

library(ggplot2) library(ggpmisc) #store fitted regression formula formula <- y ~ poly(x, 2, raw = TRUE) #create scatterplot with regression formula shown in plot ggplot(df, aes(hours, score)) + geom_point() + geom_smooth(method = "lm", formula = formula) + stat_poly_eq(formula = formula, parse = TRUE)

This produces the following result:

The x-axis represents the values from the **hours** column and the y-axis represents the values from the **score** column of the data frame.

Notice that the Multiple R-Squared value of **0.93** is displayed in the top left corner of the chart. Notice that this matches the value shown in the previous regression output as well.

Note that we also used the **geom_smooth()** function to plot the fitted regression curve so that we can visually see how well the fitted regression model fits the underlying data.

This function is optional and you don’t have to include it in your own plot but it does give the user a fitted regression curve to compare the Multiple R-Squared value with.

**Additional Resources**

The following tutorials explain how to perform other common tasks in R:

How to Use str_split in R

How to Use str_replace in R

How to Count Words in String in R

How to Convert a Vector to String in R