How to Calculate Adjusted R-Squared in Python


R-squared, often written R2, is the proportion of the variance in the response variable that can be explained by the predictor variables in a linear regression model.

The value for R-squared can range from 0 to 1. A value of 0 indicates that the response variable cannot be explained by the predictor variable at all while a value of 1 indicates that the response variable can be perfectly explained without error by the predictor variables.

The adjusted R-squared is a modified version of R-squared that adjusts for the number of predictors in a regression model. It is calculated as:

Adjusted R2 = 1 – [(1-R2)*(n-1)/(n-k-1)]

where:

  • R2: The R2 of the model
  • n: The number of observations
  • k: The number of predictor variables

Since R2 always increases as you add more predictors to a model, adjusted R2 can serve as a metric that tells you how useful a model is, adjusted for the number of predictors in a model.

This tutorial shows two examples of how to calculate adjusted R2 for a regression model in Python.

Related: What is a Good R-squared Value?

Example 1: Calculate Adjusted R-Squared with sklearn

The following code shows how to fit a multiple linear regression model and calculate the adjusted R-squared of the model using sklearn:

from sklearn.linear_model import LinearRegression
import pandas as pd

#define URL where dataset is located
url = "https://raw.githubusercontent.com/Statology/Python-Guides/main/mtcars.csv"

#read in data
data = pd.read_csv(url)

#fit regression model
model = LinearRegression()
X, y = data[["mpg", "wt", "drat", "qsec"]], data.hp
model.fit(X, y)

#display adjusted R-squared
1 - (1-model.score(X, y))*(len(y)-1)/(len(y)-X.shape[1]-1)

0.7787005290062521

The adjusted R-squared of the model turns out to be 0.7787.

Example 2: Calculate Adjusted R-Squared with statsmodels

The following code shows how to fit a multiple linear regression model and calculate the adjusted R-squared of the model using statsmodels:

import statsmodels.api as sm
import pandas as pd

#define URL where dataset is located
url = "https://raw.githubusercontent.com/Statology/Python-Guides/main/mtcars.csv"

#read in data
data = pd.read_csv(url)

#fit regression model
X, y = data[["mpg", "wt", "drat", "qsec"]], data.hp
X = sm.add_constant(X)
model = sm.OLS(y, X).fit()

#display adjusted R-squared
print(model.rsquared_adj)

0.7787005290062521

The adjusted R-squared of the model turns out to be 0.7787, which matches the result from the previous example.

Additional Resources

How to Perform Simple Linear Regression in Python
How to Perform Multiple Linear Regression in Python

Leave a Reply

Your email address will not be published. Required fields are marked *