Often you may want to extract a summary of a regression model created using scikit-learn in Python.

Unfortunately, scikit-learn doesn’t offer many built-in functions to analyze the summary of a regression model since it’s typically only used for predictive purposes.

So, if you’re interested in getting a summary of a regression model in Python, you have two options:

**1.** Use limited functions from scikit-learn.

**2.** Use statsmodels instead.

The following examples show how to use each method in practice with the following pandas DataFrame:

import pandas as pd #create DataFrame df = pd.DataFrame({'x1': [1, 2, 2, 4, 2, 1, 5, 4, 2, 4, 4], 'x2': [1, 3, 3, 5, 2, 2, 1, 1, 0, 3, 4], 'y': [76, 78, 85, 88, 72, 69, 94, 94, 88, 92, 90]}) #view first five rows of DataFrame df.head() x1 x2 y 0 1 1 76 1 2 3 78 2 2 3 85 3 4 5 88 4 2 2 72

**Method 1: Get Regression Model Summary from Scikit-Learn**

We can use the following code to fit a multiple linear regression model using scikit-learn:

from sklearn.linear_model import LinearRegression #initiate linear regression model model = LinearRegression() #define predictor and response variables X, y = df[['x1', 'x2']], df.y #fit regression model model.fit(X, y)

We can then use the following code to extract the regression coefficients of the model along with the R-squared value of the model:

#display regression coefficients and R-squared value of model print(model.intercept_, model.coef_, model.score(X, y)) 70.4828205704 [ 5.7945 -1.1576] 0.766742556527

Using this output, we can write the equation for the fitted regression model:

y = 70.48 + 5.79x_{1} – 1.16x_{2}

We can also see that the R^{2} value of the model is 76.67.

This means that **76.67%** of the variation in the response variable can be explained by the two predictor variables in the model.

Although this output is useful, we still don’t know the overall F-statistic of the model, the p-values of the individual regression coefficients, and other useful metrics that can help us understand how well the model fits the dataset.

**Method 2: Get Regression Model Summary from Statsmodels**

If you’re interested in extracting a summary of a regression model in Python, you’re better off using the **statsmodels** package.

The following code shows how to use this package to fit the same multiple linear regression model as the previous example and extract the model summary:

**import statsmodels.api as sm
#define response variable
y = df['y']
#define predictor variables
x = df[['x1', 'x2']]
#add constant to predictor variables
x = sm.add_constant(x)
#fit linear regression model
model = sm.OLS(y, x).fit()
#view model summary
print(model.summary())
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.767
Model: OLS Adj. R-squared: 0.708
Method: Least Squares F-statistic: 13.15
Date: Fri, 01 Apr 2022 Prob (F-statistic): 0.00296
Time: 11:10:16 Log-Likelihood: -31.191
No. Observations: 11 AIC: 68.38
Df Residuals: 8 BIC: 69.57
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 70.4828 3.749 18.803 0.000 61.839 79.127
x1 5.7945 1.132 5.120 0.001 3.185 8.404
x2 -1.1576 1.065 -1.087 0.309 -3.613 1.298
==============================================================================
Omnibus: 0.198 Durbin-Watson: 1.240
Prob(Omnibus): 0.906 Jarque-Bera (JB): 0.296
Skew: -0.242 Prob(JB): 0.862
Kurtosis: 2.359 Cond. No. 10.7
==============================================================================
**

Notice that the regression coefficients and the R-squared value match those calculated by scikit-learn, but we’re also provided with a ton of other useful metrics for the regression model.

For example, we can see the p-values for each individual predictor variable:

- p-value for x
_{1}= .001 - p-value for x
_{2}= 0.309

We can also see the overall F-statistic of the model, the adjusted R-squared value, the AIC value of the model, and much more.

**Additional Resources**

The following tutorials explain how to perform other common operations in Python:

How to Perform Simple Linear Regression in Python

How to Perform Multiple Linear Regression in Python

How to Calculate AIC of Regression Models in Python