Whenever you fit a general linear model (like logistic regression, Poisson regression, etc.), most statistical software will produce values for the **null deviance** and **residual deviance **of the model.

The **null deviance** tells us how well the response variable can be predicted by a model with only an intercept term.

The **residual deviance** tells us how well the response variable can be predicted by a model with *p* predictor variables. The lower the value, the better the model is able to predict the value of the response variable.

To determine if a model is “useful” we can compute the Chi-Square statistic as:

**X ^{2}** = Null deviance – Residual deviance

with *p* degrees of freedom.

We can then find the p-value associated with this Chi-Square statistic. The lower the p-value, the better the model is able to fit the dataset compared to a model with just an intercept term.

The following example shows how to interpret null and residual deviance for a logistic regression model in R.

**Example: Interpreting Null & Residual Deviance**

For this example, we’ll use the **Default** dataset from the ISLR package. We can use the following code to load and view a summary of the dataset:

#load dataset data <- ISLR::Default #view summary of dataset summary(data) default student balance income No :9667 No :7056 Min. : 0.0 Min. : 772 Yes: 333 Yes:2944 1st Qu.: 481.7 1st Qu.:21340 Median : 823.6 Median :34553 Mean : 835.4 Mean :33517 3rd Qu.:1166.3 3rd Qu.:43808 Max. :2654.3 Max. :73554

This dataset contains the following information about 10,000 individuals:

**default:**Indicates whether or not an individual defaulted.**student:**Indicates whether or not an individual is a student.**balance:**Average balance carried by an individual.**income:**Income of the individual.

We will use student status, bank balance, and income to build a logistic regression model that predicts the probability that a given individual defaults:

#fit logistic regression model model <- glm(default~balance+student+income, family="binomial", data=data) #view model summary summary(model) Call: glm(formula = default ~ balance + student + income, family = "binomial", data = data) Deviance Residuals: Min 1Q Median 3Q Max -2.4691 -0.1418 -0.0557 -0.0203 3.7383 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.087e+01 4.923e-01 -22.080 < 2e-16 *** balance 5.737e-03 2.319e-04 24.738 < 2e-16 *** studentYes -6.468e-01 2.363e-01 -2.738 0.00619 ** income 3.033e-06 8.203e-06 0.370 0.71152 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 2920.6 on 9999 degrees of freedom Residual deviance: 1571.5 on 9996 degrees of freedom AIC: 1579.5 Number of Fisher Scoring iterations: 8

We can observe the following values in the output for the null and residual deviance:

**Null deviance**: 2920.6 with df = 9999**Residual deviance**: 1571.5 with df = 9996

We can use these values to calculate the X^{2} statistic of the model:

- X
^{2}= Null deviance – Residual deviance - X
^{2}= 2910.6 – 1579.0 - X
^{2}= 1331.6

There are *p* = 3 predictor variables degrees of freedom.

We can use the Chi-Square to P-Value Calculator to find that a X^{2} value of 1331.6 with 3 degrees of freedom has a p-value of 0.000000.

Since this p-value is much less than .05, we would conclude that the model is highly useful for predicting the probability that a given individual defaults.

**Additional Resources**

The following tutorials explain how to perform logistic regression in practice in both R and Python:

How to Perform Logistic Regression in R

How to Perform Logistic Regression in Python

I was wondering why the null deviance and residual deviance values differ in the subtraction calculation compared to what’s in the R output (for ex, the R output says 2920.6 and 1571.5, but in the calculation we see 2910.6 and 1579.0).