The Akaike information criterion (AIC) is a metric that is used to compare the fit of different regression models.
It is calculated as:
AIC = 2K – 2ln(L)
- K: The number of model parameters.
- ln(L): The log-likelihood of the model. This tells us how likely the model is, given the data.
Once you’ve fit several regression models, you can compare the AIC value of each model. The model with the lowest AIC offers the best fit.
One question students often have about AIC is: How do I interpret negative AIC values?
The simple answer: The lower the value for AIC, the better the fit of the model. The absolute value of the AIC value is not important. It can be positive or negative.
For example, if Model 1 has an AIC value of -56.5 and Model 2 has an AIC value of -103.3, then Model 2 offers a better fit. It doesn’t matter if both AIC values are negative.
Understanding Negative AIC Values
It’s easy to see how a given regression model could result in a negative AIC value if we simply look at the formula use to calculate AIC:
AIC = 2K – 2ln(L)
Suppose we have a model with 7 parameters and a log-likelihood of 70.
We would calculate the AIC of this model as:
AIC = 2*7 – 2*70 = -126
We could then compare this AIC value to that of other regression models to determine which model provides the best fit.
Textbook References on Negative AIC Values
A helpful textbook reference on negative AIC values comes from Model Selection and Multimodal Inference: A Practical Information-Theoretic Approach on page 62:
Usually, AIC is positive; however, it can be shifted by any additive constant, and some shifts can result in negative values of AIC… It is not the absolute size of the AIC value, it is the relative values over the set of models considered, and particularly the differences between AIC values, that are important.
Another useful reference comes from Serious Stats: A Guide to Advanced Statistics for the Behavioral Sciences on page 402:
As with likelihood, the absolute value of AIC is largely meaningless (being determined by the arbitrary constant). As this constant depends on the data, AIC can be used to compare models fitted on identical samples.
The best model from the set of plausible models being considered is therefore the one with the smallest AIC value (the least information loss relative to the true model).
As noted in both textbooks, the absolute value of the AIC is not important. We merely use AIC values to compare the fit of models and the model with the lowest AIC value is best.