# A Complete Guide to Stepwise Regression in R This tutorial explains how to perform stepwise regression in R.

## What is Stepwise Regression?

Regression is a statistical method that allows us to understand the relationship between predictor variables and a response variable.

Stepwise regression is a procedure we can use to build a regression model from a set of candidate predictor variables by entering and removing predictors in a stepwise manner into the model until there is no statistically valid reason to enter or remove any more.

The goal of stepwise regression is to build a regression model that includes all of the predictor variables that are statistically significantly related to the response variable.

## Understanding the Stepwise Regression Procedure

The general procedure for stepwise regression is as follows:

Step1:

Step 2:

• Fit each of the one-predictor models and choose the one that produces the lowest AIC (Akaike information criterion), which is a measure of the quality of the regression model relative to all other models. That is, fit the model y ~ x1, then fit the model y ~ x2, then fit the model y ~ x3, then fit the model y ~ x4, and keep going until you have fit all one-predictor models.
• Choose the model that produces the lowest AIC value. If no model produces an AIC value that is significantly different from the intercept-only model, then stop.

Step 3:

• Suppose y ~ xproduced the model with the lowest AIC. Next, fit each of the two-predictor models that include x1 as a predictor. That is, fit the model y ~ x+ x2, then fit the model  y ~ x+ x3, and keep going until you have fit all two-predictor models.
• Choose the model that produces the lowest AIC value. If no model produces an AIC value that is significantly different from the one-predictor model, then stop. The one-predictor model is your final model.
• But, suppose x2 turned out to be the best second predictor to add, and is thus entered into the model. Now, see if entering x2 into the model somehow affected the significance of the xpredictor. If x is no longer a significant predictor, remove x1 from the model.

Step 4:

• Suppose y ~ x+ x2 produced the model with the lowest AIC. Next, fit each of the three-predictor models that include x1 and x2  as predictors. That is, fit the model y ~ x+ x2+ x3, then fit the model  y ~ x+ x2+ x4, , and keep going until you have fit all three-predictor models.
• Choose the model that produces the lowest AIC value. If no model produces an AIC value that is significantly different from the two-predictor model, then stop. The two-predictor model is your final model.
• But, suppose x3 turned out to be the best second predictor to add, and is thus entered into the model. Now, see if entering x3 into the model somehow affected the significance of the xpredictor and the xpredictor. If either predictor is no longer a significant predictor, remove that predictor from the model.

Simply continue this process until adding additional predictors no longer significantly reduces the AIC. When no additional predictor significantly reduces the AIC, you have arrived at your final model.

This process would be quite tedious to do manually, but fortunately most statistical softwares have the ability to perform this process automatically.

## Fitting a Stepwise Regression in R

Now, we’ll illustrate how to perform stepwise regression in R using the built-in dataset mtcars:

```#view first six rows of mtcars

#                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
#Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
#Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
```

Suppose we want to fit a regression model using mpg (miles per gallon) as our response variable and all of the other 10 variables in the dataset as potential predictors.

First, we will define our intercept-only model.

Then, we will define a scope to tell R that we’d like to attempt to add all possible predictors into the model in a stepwise manner.

Lastly, we will use the step() function to fit the stepwise model, which uses the following syntax:

step(intercept-only model, direction, scope)

• intercept-only model: the formula for the intercept-only model
• direction: the mode of stepwise search, can be either “both”, “backward”, or “forward” – we will use “both” since we want to attempt to enter and remove predictors at each step
• scope: a formula that specifies which predictors we’d like to attempt to enter into the model

The following code illustrates how to conduct this stepwise regression:

```#define intercept-only model
intercept_only_model <- lm(mpg ~ 1, data = mtcars)

#define total model
total_model <- lm(mpg ~ ., data = mtcars)

#perform stepwise regression
step(intercept_only_model, direction = 'both', scope = formula(total_model))

#Start:  AIC=115.94
#mpg ~ 1
#
#       Df Sum of Sq     RSS     AIC
#+ wt    1    847.73  278.32  73.217
#+ cyl   1    817.71  308.33  76.494
#+ disp  1    808.89  317.16  77.397
#+ hp    1    678.37  447.67  88.427
#+ drat  1    522.48  603.57  97.988
#+ vs    1    496.53  629.52  99.335
#+ am    1    405.15  720.90 103.672
#+ carb  1    341.78  784.27 106.369
#+ gear  1    259.75  866.30 109.552
#+ qsec  1    197.39  928.66 111.776
#<none>              1126.05 115.943
#
#Step:  AIC=73.22
#mpg ~ wt
#
#       Df Sum of Sq     RSS     AIC
#+ cyl   1     87.15  191.17  63.198
#+ hp    1     83.27  195.05  63.840
#+ qsec  1     82.86  195.46  63.908
#+ vs    1     54.23  224.09  68.283
#+ carb  1     44.60  233.72  69.628
#+ disp  1     31.64  246.68  71.356
#<none>               278.32  73.217
#+ drat  1      9.08  269.24  74.156
#+ gear  1      1.14  277.19  75.086
#+ am    1      0.00  278.32  75.217
#- wt    1    847.73 1126.05 115.943
#
#Step:  AIC=63.2
#mpg ~ wt + cyl
#
#       Df Sum of Sq    RSS    AIC
#+ hp    1    14.551 176.62 62.665
#+ carb  1    13.772 177.40 62.805
#<none>              191.17 63.198
#+ qsec  1    10.567 180.60 63.378
#+ gear  1     3.028 188.14 64.687
#+ disp  1     2.680 188.49 64.746
#+ vs    1     0.706 190.47 65.080
#+ am    1     0.125 191.05 65.177
#+ drat  1     0.001 191.17 65.198
#- cyl   1    87.150 278.32 73.217
#- wt    1   117.162 308.33 76.494
#
#Step:  AIC=62.66
#mpg ~ wt + cyl + hp
#
#       Df Sum of Sq    RSS    AIC
#<none>              176.62 62.665
#- hp    1    14.551 191.17 63.198
#+ am    1     6.623 170.00 63.442
#+ disp  1     6.176 170.44 63.526
#- cyl   1    18.427 195.05 63.840
#+ carb  1     2.519 174.10 64.205
#+ drat  1     2.245 174.38 64.255
#+ qsec  1     1.401 175.22 64.410
#+ gear  1     0.856 175.76 64.509
#+ vs    1     0.060 176.56 64.654
#- wt    1   115.354 291.98 76.750
#
#Call:
#lm(formula = mpg ~ wt + cyl + hp, data = mtcars)
#
#Coefficients:
#(Intercept)           wt          cyl           hp
#   38.75179     -3.16697     -0.94162     -0.01804
```

Our final model turns out to be mpg ~ 38.75 – 3.17*wt – 0.94*cyl – 0.02*hyp.

A total of three predictors were used out of the possible ten.

## Interpreting a Stepwise Regression in R

Let’s walk through exactly what just happened when R performed this stepwise regression.

First, we start with the intercept-only model. R tells us that the model at this point is mpg ~ 1, which has an AIC of 115.94. Then, R fits every possible one-predictor model and shows the corresponding AIC. We can see that adding the predictor wt produces the model with the lowest AIC:

```#perform stepwise regression
step(intercept_only_model, direction = 'both', scope = formula(total_model))

#Start:  AIC=115.94
#mpg ~ 1
#
#       Df Sum of Sq     RSS     AIC
#+ wt    1    847.73  278.32  73.217
#+ cyl   1    817.71  308.33  76.494
#+ disp  1    808.89  317.16  77.397
#+ hp    1    678.37  447.67  88.427
#+ drat  1    522.48  603.57  97.988
#+ vs    1    496.53  629.52  99.335
#+ am    1    405.15  720.90 103.672
#+ carb  1    341.78  784.27 106.369
#+ gear  1    259.75  866.30 109.552
#+ qsec  1    197.39  928.66 111.776
#<none>              1126.05 115.943```

R determines that the model mpg ~ wt offers a statistically significant reduction in AIC compared to the intercept-only model, so wt is entered into the model. Thus, the model at this point is mpg ~ wt, which has an AIC of 73.22.

Then, R fits every possible two-predictor model that includes wt as a predictor and shows the corresponding AIC. We can see that adding the predictor cyl produces the model with the lowest AIC at 63.198. By comparison, adding no additional variables to the model (<none>) would simply keep the AIC at 73.22. Thus, it makes sense to add cyl as a predictor into the model:

```#Step:  AIC=73.22
#mpg ~ wt
#
#       Df Sum of Sq     RSS     AIC
#+ cyl   1     87.15  191.17  63.198
#+ hp    1     83.27  195.05  63.840
#+ qsec  1     82.86  195.46  63.908
#+ vs    1     54.23  224.09  68.283
#+ carb  1     44.60  233.72  69.628
#+ disp  1     31.64  246.68  71.356
#<none>               278.32  73.217
#+ drat  1      9.08  269.24  74.156
#+ gear  1      1.14  277.19  75.086
#+ am    1      0.00  278.32  75.217
#- wt    1    847.73 1126.05 115.943```

The model at this point is mpg ~ wt + cyl, which has an AIC of 63.198Next, R fits every possible three-predictor model that includes wt and cyl as predictors and shows the corresponding AIC. We can see that adding the predictor hp produces the model with the lowest AIC at 62.665. By comparison, adding no additional variables to the model (<none>) would simply keep the AIC at 63.198. Thus, it makes sense to add hp as a predictor into the model:

```#mpg ~ wt + cyl
#
#       Df Sum of Sq    RSS    AIC
#+ hp    1    14.551 176.62 62.665
#+ carb  1    13.772 177.40 62.805
#              191.17 63.198
#+ qsec  1    10.567 180.60 63.378
#+ gear  1     3.028 188.14 64.687
#+ disp  1     2.680 188.49 64.746
#+ vs    1     0.706 190.47 65.080
#+ am    1     0.125 191.05 65.177
#+ drat  1     0.001 191.17 65.198
#- cyl   1    87.150 278.32 73.217
#- wt    1   117.162 308.33 76.494```

The model at this point is mpg ~ wt + cyl + hp, which has an AIC of 62.665. Next, R fits every possible four-predictor model that includes wt, cyl, and hp as predictors and shows the corresponding AIC. We can see that adding no additional variables to the model (<none>) produces the lowest AIC. Thus, we have arrived at our final model, which turns out to be mpg ~ wt + cyl + hpR tells us that the final fitted model is mpg ~ 38.75 – 3.17*wt – 0.94*cyl – 0.02*hyp.

```#Step:  AIC=62.66
#mpg ~ wt + cyl + hp
#
#       Df Sum of Sq    RSS    AIC
#              176.62 62.665
#- hp    1    14.551 191.17 63.198
#+ am    1     6.623 170.00 63.442
#+ disp  1     6.176 170.44 63.526
#- cyl   1    18.427 195.05 63.840
#+ carb  1     2.519 174.10 64.205
#+ drat  1     2.245 174.38 64.255
#+ qsec  1     1.401 175.22 64.410
#+ gear  1     0.856 175.76 64.509
#+ vs    1     0.060 176.56 64.654
#- wt    1   115.354 291.98 76.750
#
#Call:
#lm(formula = mpg ~ wt + cyl + hp, data = mtcars)
#
#Coefficients:
#(Intercept)           wt          cyl           hp
#   38.75179     -3.16697     -0.94162     -0.01804```

## Cautions for Using Stepwise Regression

There are several things to keep in mind when using stepwise regression, including:

• Stepwise regression leads to one single final model, but in reality there are often several equally good models that could be used.
• The final model is not guaranteed to be optimal in any specified way.
• It is possible that not all important predictors in the model have been included. It’s also possible that not all unimportant predictors have been excluded.
• The order in which the predictors are entered into the model should not be over-interpreted.