Logistic Regression is a statistical method that we use to fit a regression model when the response variable is binary. To assess how well a logistic regression model fits a dataset, we can look at the following two metrics:
- Sensitivity: The probability that the model predicts a positive outcome for an observation when indeed the outcome is positive.
- Specificity: The probability that the model predicts a negative outcome for an observation when indeed the outcome is negative.
One easy way to visualize these two metrics is by creating a ROC curve, which is a plot that displays the sensitivity and specificity of a logistic regression model.
This tutorial explains how to create and interpret a ROC curve in SPSS.
Example: ROC Curve in SPSS
Suppose we have the following dataset that shows whether or not a basketball player got drafted into the NBA (0 = no, 1 = yes) along with their average points per game in college:
To create an ROC curve for this dataset, click the Analyze tab, then Classify, then ROC Curve:
In the new window that pops up, drag the variable draft into the box labelled State Variable. Define the Value of the State Variable to be 1. (This is the value that indicates a player got drafted). Drag the variable points into the box labelled Test Variable.
Check the boxes next to With diagonal reference line and Coordinate points of the ROC Curve. Then click OK.
Here is how to interpret the output:
Case Processing Summary:
This table displays the total number of positive and negative cases in the dataset. In this example 8 players got drafted (positive result) and 6 players did not get drafted (negative result):
The ROC (Receiver Operating Characteristic) curve is a plot of the values of sensitivity vs. 1-specificity as the value of the cut-off point moves from 0 to 1:
A model with high sensitivity and high specificity will have a ROC curve that hugs the top left corner of the plot. A model with low sensitivity and low specificity will have a curve that is close to the 45-degree diagonal line.
We can see that the ROC curve (the blue line) in this example hugs the top left corner of the plot, which indicates that the model does a good job of predicting whether or not players will get drafted, based on their average points per game.
Area Under the Curve:
The Area Under the Curve gives us an idea of how well the model is able to distinguish between positive and negative outcomes. The AUC can range from 0 to 1. The higher the AUC, the better the model is at correctly classifying outcomes.
We can see that the AUC for this particular logistic regression model is .948, which is extremely high. This indicates that the model does a good job of predicting whether or not a player will get drafted.
Coordinates of the Curve:
This last table displays the sensitivity and 1 – specificity of the ROC curve for various cut-off points.
If we allow the cut-off point to be 8.50, this means we predict that any player who scores less than 8.50 points per game to not get drafted, and any player who scores greater than 8.50 points per game to get drafted.
Using this as a cut off point, our sensitivity would be 100% (since each player that scored less than 8.50 points per game indeed did not get drafted) and our 1 – specificity would be 66.7% (since 8 out of 12 players who scored more than 8.50 points per game actually did get drafted).
The table above allows us to see the sensitivity and 1-specificity for every potential cut-off point.