One way to visualize the performance of classification models in machine learning is by creating a ROC curve, which stands for “receiver operating characteristic” curve.
This type of curve displays the sensitivity and specificity of a classification model:
- Sensitivity: The probability that the model predicts a positive outcome for an observation when the outcome is indeed positive.
- Specificity: The probability that the model predicts a negative outcome for an observation when the outcome is indeed negative.
The x-axis of a ROC curve represents (1- Specificity) and the y-axis represents the Sensitivity:
The more that the ROC curve hugs the top left corner of the plot, the better the model does at classifying the data into categories.
To quantify this, we can calculate the AUC (area under the curve) which tells us how much of the plot is located under the curve.
The closer AUC is to 1, the better the model.
When comparing two ROC curves to determine which classification model is best, we often look at which ROC curve “hugs” the top left corner of the plot more and thus has a higher AUC value.
Example: How to Compare Two ROC Curves
Suppose we fit a logistic regression model and a gradient boosted model to a dataset to predict the outcome of some response variable.
Suppose we then create ROC curves to visualize the performance of each model:
The blue line shows the ROC curve for the logistic regression model and the orange line shows the ROC curve for the gradient boosted model.
From our plot we can see the following AUC values for each model:
- AUC of logistic regression model: 0.7902
- AUC of gradient boosted model: 0.9712
Since the gradient boosted model has a higher AUC value, we would say that it does a better job of predicted the outcome of the response variable.
Note: In this example we only compared two ROC curves, but it’s possible to fit several different classification models to a dataset and compare even more ROC curves to determine the best model to use.
Additional Resources
The following tutorials provide additional information about classification models and ROC curves:
Introduction to Logistic Regression
How to Interpret a ROC Curve
What is Considered a Good AUC Score?