We often use three different sum of squares values to measure how well a regression line fits a dataset:

**1. Sum of Squares Total (SST) – **The sum of squared differences between individual data points (y_{i}) and the mean of the response variable (y).

- SST = Σ(y
_{i}– y)^{2}

**2. Sum of Squares Regression (SSR)** – The sum of squared differences between predicted data points (ŷ_{i}) and the mean of the response variable(y).

- SSR = Σ(ŷ
_{i}– y)^{2}

**3. Sum of Squares Error (SSE)** – The sum of squared differences between predicted data points (ŷ_{i}) and observed data points (y_{i}).

- SSE = Σ(ŷ
_{i}– y_{i})^{2}

The following step-by-step example shows how to calculate each of these metrics for a given regression model in Python.

**Step 1: Create the Data**

First, let’s create a dataset that contains the number of hours studied and exam score received for 20 different students at a certain university:

import pandas as pd #create pandas DataFrame df = pd.DataFrame({'hours': [1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 6, 7, 7, 8], 'score': [68, 76, 74, 80, 76, 78, 81, 84, 86, 83, 88, 85, 89, 94, 93, 94, 96, 89, 92, 97]}) #view first five rows of DataFrame df.head() hours score 0 1 68 1 1 76 2 1 74 3 2 80 4 2 76

**Step 2: Fit a Regression Model**

Next, we’ll use the **OLS()** function from the statsmodels library to fit a simple linear regression model using score as the response variable and hours as the predictor variable:

import statsmodels.api as sm #define response variable y = df['score'] #define predictor variable x = df[['hours']] #add constant to predictor variables x = sm.add_constant(x) #fit linear regression model model = sm.OLS(y, x).fit()

**Step 3: Calculate SST, SSR, and SSE**

Lastly, we can use the following formulas to calculate the SST, SSR, and SSE values of the model:

import numpy as np #calculate sse sse = np.sum((model.fittedvalues - df.score)**2) print(sse) 331.07488479262696 #calculate ssr ssr = np.sum((model.fittedvalues - df.score.mean())**2) print(ssr) 917.4751152073725 #calculate sst sst = ssr + sse print(sst) 1248.5499999999995

The metrics turn out to be:

**Sum of Squares Total (SST):**1248.55**Sum of Squares Regression (SSR):**917.4751**Sum of Squares Error (SSE):**331.0749

We can verify that SST = SSR + SSE:

- SST = SSR + SSE
- 1248.55 = 917.4751 + 331.0749

**Additional Resources**

You can use the following calculators to automatically calculate SST, SSR, and SSE for any simple linear regression line:

The following tutorials explain how to calculate SST, SSR, and SSE in other statistical software: