The Akaike information criterion (AIC) is a metric that is used to compare the fit of several regression models.
It is calculated as:
AIC = 2K – 2ln(L)
- K: The number of model parameters. The default value of K is 2, so a model with just one predictor variable will have a K value of 2+1 = 3.
- ln(L): The log-likelihood of the model. Most statistical software can automatically calculate this value for you.
The AIC is designed to find the model that explains the most variation in the data, while penalizing for models that use an excessive number of parameters.
Once you’ve fit several regression models, you can compare the AIC value of each model. The lower the AIC, the better the model fit.
The following example shows how to calculate the AIC for various regression models in SAS.
Example: How to Calculate AIC in SAS
Suppose we would like to fit three different multiple linear regression models to predict the exam score that students will receive in some class.
Here are the predictor variables we’ll use in each model:
- Predictor variables in Model 1: hours spent studying
- Predictor variables in Model 2: practice exams taken
- Predictor variables in Model 3: hours spent studying and practice exams taken
First, we’ll use the following code to create a dataset that contains this information for 20 students:
/*create dataset*/ data exam_data; input hours prep_exams score; datalines; 1 1 76 2 3 78 2 3 85 4 5 88 2 2 72 1 2 69 5 1 94 4 1 94 2 0 88 4 3 92 4 4 90 3 3 75 6 2 96 5 4 90 3 4 82 4 4 85 6 5 99 2 1 83 1 0 62 2 1 76 ; run;
Next, we’ll use proc reg to fit each of these regression models and we’ll use the statement selection=adjrsq sse aic to calculate the AIC values for each model:
/*fit multiple linear regression models and calculate AIC for each model*/ proc reg data=exam_data; model score = hours prep_exams / selection=adjrsq sse aic; run;
From the output we can see the AIC values for each model:
- AIC with hours as predictor variable: 68.4537
- AIC with hours and exams as predictor variables: 69.9507
- AIC with exams as predictor variable: 91.4967
The model with the lowest AIC value is the one that only contains hours as the predictor variable.
Thus, we would declare the following model to be the one that best fits the data:
Score = β0 + β1(Hours Studied)
Once we’ve identified this model as the best, we can proceed to fit the model and analyze the results including the R-squared value and the beta coefficients to determine the exact relationship between hours studied and final exam score.
The following tutorials explain how to perform other common tasks in SAS:
How to Perform Simple Linear Regression in SAS
How to Perform Multiple Linear Regression in SAS
How to Calculate R-Squared in SAS
How to Calculate RMSE in SAS