**R-squared**, often written R^{2}, is the proportion of the variance in the response variable that can be explained by the predictor variables in a linear regression model.

The value for R-squared can range from 0 to 1 where:

**0**indicates that the response variable cannot be explained by the predictor variable at all.**1**indicates that the response variable can be perfectly explained without error by the predictor variables.

The following example shows how to calculate R^{2} for a regression model in Python.

**Example: Calculate R-Squared in Python**

Suppose we have the following pandas DataFrame:

import pandas as pd #create DataFrame df = pd.DataFrame({'hours': [1, 2, 2, 4, 2, 1, 5, 4, 2, 4, 4, 3, 6], 'prep_exams': [1, 3, 3, 5, 2, 2, 1, 1, 0, 3, 4, 3, 2], 'score': [76, 78, 85, 88, 72, 69, 94, 94, 88, 92, 90, 75, 96]}) #view DataFrame print(df) hours prep_exams score 0 1 1 76 1 2 3 78 2 2 3 85 3 4 5 88 4 2 2 72 5 1 2 69 6 5 1 94 7 4 1 94 8 2 0 88 9 4 3 92 10 4 4 90 11 3 3 75 12 6 2 96

We can use the **LinearRegression()** function from sklearn to fit a regression model and the **score()** function to calculate the R-squared value for the model:

from sklearn.linear_model import LinearRegression #initiate linear regression model model = LinearRegression() #define predictor and response variables X, y = df[["hours", "prep_exams"]], df.score #fit regression model model.fit(X, y) #calculate R-squared of regression model r_squared = model.score(X, y) #view R-squared value print(r_squared) 0.7175541714105901

The R-squared of the model turns out to be **0.7176**.

This means that **71.76%** of the variation in the exam scores can be explained by the number of hours studied and the number of prep exams taken.

If we’d like, we could then compare this R-squared value to another regression model with a different set of predictor variables.

In general, models with higher R-squared values are preferred because it means the set of predictor variables in the model is capable of explaining the variation in the response variable well.

**Related:** What is a Good R-squared Value?

**Additional Resources**

The following tutorials explain how to perform other common operations in Python:

How to Perform Simple Linear Regression in Python

How to Perform Multiple Linear Regression in Python

How to Calculate AIC of Regression Models in Python

I want to perform a multiple linear regression of variables with price but I am getting an error.

X, Y = df[[“floors”, “waterfront”,”lat” ,”bedrooms” ,”sqft_basement” ,”view” ,”bathrooms”,”sqft_living15″,”sqft_above”,”grade”,”sqft_living”]], df.price

from sklearn.linear_model import LinearRegression

lm = LinearRegression()

lm.fit(X, Y)

ValueError: Input X contains NaN.

LinearRegression does not accept missing values encoded as NaN natively. For supervised learning, you might want to consider sklearn.ensemble.HistGradientBoostingClassifier and Regressor which accept missing values encoded as NaNs natively. Alternatively, it is possible to preprocess the data, for instance by using an imputer transformer in a pipeline or drop samples with missing values. See https://scikit-learn.org/stable/modules/impute.html You can find a list of all estimators that handle NaN values at the following page: https://scikit-learn.org/stable/modules/impute.html#estimators-that-handle-nan-values

how do I get the multiple linear regression and R squared value