How to Calculate Residual Sum of Squares in Python


A residual is the difference between an observed value and a predicted value in a regression model.

It is calculated as:

Residual = Observed value – Predicted value

One way to understand how well a regression model fits a dataset is to calculate the residual sum of squares, which is calculated as:

Residual sum of squares = Σ(ei)2

where:

  • Σ: A Greek symbol that means “sum”
  • ei: The ith residual

The lower the value, the better a model fits a dataset.

This tutorial provides a step-by-step example of how to calculate the residual sum of squares for a regression model in Python.

Step 1: Enter the Data

For this example we’ll enter data for the number of hours spent studying, total prep exams taken, and exam score received by 14 different students:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'hours': [1, 2, 2, 4, 2, 1, 5, 4, 2, 4, 4, 3, 6, 5],
                   'exams': [1, 3, 3, 5, 2, 2, 1, 1, 0, 3, 4, 3, 2, 4],
                   'score': [76, 78, 85, 88, 72, 69, 94, 94, 88, 92, 90, 75, 96, 90]})

Step 2: Fit the Regression Model

Next, we’ll use the OLS() function from the statsmodels library to perform ordinary least squares regression, using “hours” and “exams” as the predictor variables and “score” as the response variable:

import statsmodels.api as sm

#define response variable
y = df['score']

#define predictor variables
x = df[['hours', 'exams']]

#add constant to predictor variables
x = sm.add_constant(x)

#fit linear regression model
model = sm.OLS(y, x).fit()

#view model summary
print(model.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  score   R-squared:                       0.722
Model:                            OLS   Adj. R-squared:                  0.671
Method:                 Least Squares   F-statistic:                     14.27
Date:                Sat, 02 Jan 2021   Prob (F-statistic):           0.000878
Time:                        15:58:35   Log-Likelihood:                -41.159
No. Observations:                  14   AIC:                             88.32
Df Residuals:                      11   BIC:                             90.24
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         71.8144      3.680     19.517      0.000      63.716      79.913
hours          5.0318      0.942      5.339      0.000       2.958       7.106
exams         -1.3186      1.063     -1.240      0.241      -3.658       1.021
==============================================================================
Omnibus:                        0.976   Durbin-Watson:                   1.270
Prob(Omnibus):                  0.614   Jarque-Bera (JB):                0.757
Skew:                          -0.245   Prob(JB):                        0.685
Kurtosis:                       1.971   Cond. No.                         12.1
==============================================================================

Step 3: Calculate the Residual Sum of Squares

We can use the following code to calculate the residual sum of squares for the model:

print(model.ssr)

293.25612951525414

The residual sum of squares turns out to be 293.256.

Additional Resources

How to Perform Simple Linear Regression in Python
How to Perform Multiple Linear Regression in Python
Residual Sum of Squares Calculator

Leave a Reply

Your email address will not be published.