When we fit linear regression models we often calculate the **R-squared** value of the model.

The R-squared value is the proportion of the variance in the response variable that can be explained by the predictor variables in the model.

The value for R-squared can range from 0 to 1 where:

- A value of
**0**indicates that the response variable cannot be explained by the predictor variables at all. - A value of
**1**indicates that the response variable can be perfectly explained by the predictor variables.

Although this metric is commonly used to assess how well a regression model fits a dataset, it has one serious drawback:

The drawback of R-squared:

R-squared will always increase when a new predictor variable is added to the regression model.

Even if a new predictor variable is almost completely unrelated to the response variable, the R-squared value of the model will increase, if only by a small amount.

For this reason, it’s possible that a regression model with a large number of predictor variables has a high R-squared value, even if the model doesn’t fit the data well.

Fortunately there is an alternative to R-squared known as **adjusted R-squared**.

The **adjusted R-squared** is a modified version of R-squared that adjusts for the number of predictors in a regression model.

It is calculated as:

**Adjusted R ^{2} = 1 – [(1-R^{2})*(n-1)/(n-k-1)]**

where:

**R**: The R^{2}^{2}of the model**n**: The number of observations**k**: The number of predictor variables

Because R-squared always increases as you add more predictors to a model, the adjusted R-squared can tell you how useful a model is, *adjusted for the number of predictors in a model*.

The advantage of Adjusted R-squared:

Adjusted R-squared tells us how well a set of predictor variables is able to explain the variation in the response variable,

adjusted for the number of predictors in a model.

Because of the way it’s calculated, adjusted R-squared can be used to compare the fit of regression models with different numbers of predictor variables.

To gain a better understanding of adjusted R-squared, check out the following example.

**Example: Understanding Adjusted R-Squared in Regression Models**

Suppose a professor collects data on students in his class and fits the following regression model to understand how hours spent studying and current grade in the class affect the score a student receives on the final exam.

Exam Score = β_{0} + β_{1}(hours spent studying) + β_{2}(current grade)

Suppose this regression model has the following metrics:

- R-squared:
**0.955** - Adjusted R-squared:
**0.946**

Now suppose the professor decides to collect data on another variable for each student: shoe size.

Although this variable should be completely unrelated to the final exam score, he decides to fit the following regression model:

Exam Score = β_{0} + β_{1}(hours spent studying) + β_{2}(current grade) + β_{3}(shoe size)

Suppose this regression model has the following metrics:

- R-squared:
**0.965** - Adjusted R-squared:
**0.902**

If we only looked at the **R-squared** values for each of these two regression models, we would conclude that the second model is better to use because it has a higher R-squared value!

However, if we look at the **adjusted R-squared** values then we come to a different conclusion: The first model is better to use because it has a higher adjusted R-squared value.

The second model only has a higher R-squared value because it has more predictor variables than the first model.

However, the predictor variable that we added (shoe size) was a poor predictor of final exam score, so the adjusted R-squared value penalized the model for adding this predictor variable.

This example illustrates why adjusted R-squared is a better metric to use when comparing the fit of regression models with different numbers of predictor variables.

**Additional Resources**

The following tutorials explain how to calculated adjusted R-squared values using different statistical software:

How to Calculate Adjusted R-Squared in R

How to Calculate Adjusted R-Squared in Excel

How to Calculate Adjusted R-Squared in Python