How to Interpret Null & Residual Deviance (With Examples)


Whenever you fit a general linear model (like logistic regression, Poisson regression, etc.), most statistical software will produce values for the null deviance and residual deviance of the model.

The null deviance tells us how well the response variable can be predicted by a model with only an intercept term.

The residual deviance tells us how well the response variable can be predicted by a model with p predictor variables. The lower the value, the better the model is able to predict the value of the response variable.

To determine if a model is “useful” we can compute the Chi-Square statistic as:

X2 = Null deviance – Residual deviance

with p degrees of freedom.

We can then find the p-value associated with this Chi-Square statistic. The lower the p-value, the better the model is able to fit the dataset compared to a model with just an intercept term.

The following example shows how to interpret null and residual deviance for a logistic regression model in R.

Example: Interpreting Null & Residual Deviance

For this example, we’ll use the Default dataset from the ISLR package. We can use the following code to load and view a summary of the dataset:

#load dataset
data <- ISLR::Default

#view summary of dataset
summary(data)

 default    student       balance           income     
 No :9667   No :7056   Min.   :   0.0   Min.   :  772  
 Yes: 333   Yes:2944   1st Qu.: 481.7   1st Qu.:21340  
                       Median : 823.6   Median :34553  
                       Mean   : 835.4   Mean   :33517  
                       3rd Qu.:1166.3   3rd Qu.:43808  
                       Max.   :2654.3   Max.   :73554 

This dataset contains the following information about 10,000 individuals:

  • default: Indicates whether or not an individual defaulted.
  • student: Indicates whether or not an individual is a student.
  • balance: Average balance carried by an individual.
  • income: Income of the individual.

We will use student status, bank balance, and income to build a logistic regression model that predicts the probability that a given individual defaults:

#fit logistic regression model
model <- glm(default~balance+student+income, family="binomial", data=data)

#view model summary
summary(model)

Call:
glm(formula = default ~ balance + student + income, family = "binomial", 
    data = data)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.4691  -0.1418  -0.0557  -0.0203   3.7383  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.087e+01  4.923e-01 -22.080  < 2e-16 ***
balance      5.737e-03  2.319e-04  24.738  < 2e-16 ***
studentYes  -6.468e-01  2.363e-01  -2.738  0.00619 ** 
income       3.033e-06  8.203e-06   0.370  0.71152    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2920.6  on 9999  degrees of freedom
Residual deviance: 1571.5  on 9996  degrees of freedom
AIC: 1579.5

Number of Fisher Scoring iterations: 8

We can observe the following values in the output for the null and residual deviance:

  • Null deviance: 2920.6 with df = 9999
  • Residual deviance: 1571.5 with df = 9996

We can use these values to calculate the X2 statistic of the model:

  • X2 = Null deviance – Residual deviance
  • X2 = 2910.6 – 1579.0
  • X2 = 1331.6

There are p = 3 predictor variables degrees of freedom.

We can use the Chi-Square to P-Value Calculator to find that a X2 value of 1331.6 with 3 degrees of freedom has a p-value of 0.000000.

Since this p-value is much less than .05, we would conclude that the model is highly useful for predicting the probability that a given individual defaults.

Additional Resources

The following tutorials explain how to perform logistic regression in practice in both R and Python:

How to Perform Logistic Regression in R
How to Perform Logistic Regression in Python

Leave a Reply

Your email address will not be published.