**Regression** is a statistical method that allows us to understand the relationship between predictor variables and a response variable.

**Stepwise regression** is a procedure we can use to build a regression model from a set of candidate predictor variables by entering and removing predictors in a stepwise manner into the model until there is no statistically valid reason to enter or remove any more.

The goal of stepwise regression is to build a regression model that includes all of the predictor variables that are statistically significantly related to the response variable.

**Understanding the Stepwise Regression Procedure**

The general procedure for stepwise regression is as follows:

**Step1:**

- Start with the intercept-only model. That is, start with no predictors in the model.

**Step 2:**

- Fit each of the one-predictor models and choose the one that produces the lowest AIC (Akaike information criterion), which is a measure of the quality of the regression model relative to all other models. That is, fit the model
*y ~ x*, then fit the model_{1}*y ~ x*, then fit the model_{2}*y ~ x*, then fit the model_{3}*y ~ x*, and keep going until you have fit all one-predictor models._{4} - Choose the model that produces the lowest AIC value. If no model produces an AIC value that is significantly different from the intercept-only model, then stop.

**Step 3:**

- Suppose
*y ~ x*produced the model with the lowest AIC. Next, fit each of the two-predictor models that include_{1 }*x*as a predictor. That is, fit the model_{1 }*y ~ x*, then fit the model_{1 }+ x_{2}*y ~ x*, and keep going until you have fit all two-predictor models._{1 }+ x_{3} - Choose the model that produces the lowest AIC value. If no model produces an AIC value that is significantly different from the one-predictor model, then stop. The one-predictor model is your final model.
- But, suppose
*x*_{2}*x*_{2}*x*predictor. If_{1 }*x*_{1 }*x*_{1}

**Step 4:**

- Suppose
*y ~ x*_{1 }*+ x*produced the model with the lowest AIC. Next, fit each of the three-predictor models that include_{2 }*x*and_{1 }*x*as predictors. That is, fit the model_{2 }*y ~ x*_{1 }+ x_{2}*+ x*, then fit the model_{3}*y ~ x*, and keep going until you have fit all three-predictor models._{1 }+ x_{2}+ x_{4}, - Choose the model that produces the lowest AIC value. If no model produces an AIC value that is significantly different from the two-predictor model, then stop. The two-predictor model is your final model.
- But, suppose
*x*_{3}*x*_{3}*x*predictor and the_{1 }*x*predictor. If either predictor_{2 }

Simply continue this process until adding additional predictors no longer significantly reduces the AIC. When no additional predictor significantly reduces the AIC, you have arrived at your final model.

This process would be quite tedious to do manually, but fortunately most statistical softwares have the ability to perform this process automatically.

**Fitting a Stepwise Regression in R**

Now, we’ll illustrate how to perform stepwise regression in R using the built-in dataset **mtcars**:

#view first six rows ofmtcarshead(mtcars) # mpg cyl disp hp drat wt qsec vs am gear carb #Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 #Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 #Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 #Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 #Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 #Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

Suppose we want to fit a regression model using *mpg *(miles per gallon) as our response variable and all of the other 10 variables in the dataset as potential predictors.

First, we will define our *intercept-only model*.

Then, we will define a *scope *to tell R that we’d like to attempt to add all possible predictors into the model in a stepwise manner.

Lastly, we will use the **step()** function to fit the stepwise model, which uses the following syntax:

step(intercept-only model, direction, scope)

**intercept-only model**: the formula for the intercept-only model**direction:**the mode of stepwise search, can be either “both”, “backward”, or “forward” – we will use “both” since we want to attempt to enter*and*remove predictors at each step**scope:**a formula that specifies which predictors we’d like to attempt to enter into the model

The following code illustrates how to conduct this stepwise regression:

#define intercept-only model intercept_only_model <- lm(mpg ~ 1, data = mtcars) #define total model total_model <- lm(mpg ~ ., data = mtcars) #perform stepwise regression step(intercept_only_model, direction = 'both', scope = formula(total_model)) #Start: AIC=115.94 #mpg ~ 1 # # Df Sum of Sq RSS AIC #+ wt 1 847.73 278.32 73.217 #+ cyl 1 817.71 308.33 76.494 #+ disp 1 808.89 317.16 77.397 #+ hp 1 678.37 447.67 88.427 #+ drat 1 522.48 603.57 97.988 #+ vs 1 496.53 629.52 99.335 #+ am 1 405.15 720.90 103.672 #+ carb 1 341.78 784.27 106.369 #+ gear 1 259.75 866.30 109.552 #+ qsec 1 197.39 928.66 111.776 #<none> 1126.05 115.943 # #Step: AIC=73.22 #mpg ~ wt # # Df Sum of Sq RSS AIC #+ cyl 1 87.15 191.17 63.198 #+ hp 1 83.27 195.05 63.840 #+ qsec 1 82.86 195.46 63.908 #+ vs 1 54.23 224.09 68.283 #+ carb 1 44.60 233.72 69.628 #+ disp 1 31.64 246.68 71.356 #<none> 278.32 73.217 #+ drat 1 9.08 269.24 74.156 #+ gear 1 1.14 277.19 75.086 #+ am 1 0.00 278.32 75.217 #- wt 1 847.73 1126.05 115.943 # #Step: AIC=63.2 #mpg ~ wt + cyl # # Df Sum of Sq RSS AIC #+ hp 1 14.551 176.62 62.665 #+ carb 1 13.772 177.40 62.805 #<none> 191.17 63.198 #+ qsec 1 10.567 180.60 63.378 #+ gear 1 3.028 188.14 64.687 #+ disp 1 2.680 188.49 64.746 #+ vs 1 0.706 190.47 65.080 #+ am 1 0.125 191.05 65.177 #+ drat 1 0.001 191.17 65.198 #- cyl 1 87.150 278.32 73.217 #- wt 1 117.162 308.33 76.494 # #Step: AIC=62.66 #mpg ~ wt + cyl + hp # # Df Sum of Sq RSS AIC #<none> 176.62 62.665 #- hp 1 14.551 191.17 63.198 #+ am 1 6.623 170.00 63.442 #+ disp 1 6.176 170.44 63.526 #- cyl 1 18.427 195.05 63.840 #+ carb 1 2.519 174.10 64.205 #+ drat 1 2.245 174.38 64.255 #+ qsec 1 1.401 175.22 64.410 #+ gear 1 0.856 175.76 64.509 #+ vs 1 0.060 176.56 64.654 #- wt 1 115.354 291.98 76.750 # #Call: #lm(formula = mpg ~ wt + cyl + hp, data = mtcars) # #Coefficients: #(Intercept) wt cyl hp # 38.75179 -3.16697 -0.94162 -0.01804

Our final model turns out to be ** mpg ~ 38.75 – 3.17*wt – 0.94*cyl – 0.02*hyp**.

A total of three predictors were used out of the possible ten.

**Interpreting a Stepwise Regression in R**

Let’s walk through exactly what just happened when R performed this stepwise regression.

First, we start with the intercept-only model. R tells us that the model at this point is *mpg ~ 1*, which has an AIC of **115.94**. Then, R fits every possible one-predictor model and shows the corresponding AIC. We can see that adding the predictor *wt *produces the model with the lowest AIC:

#perform stepwise regression step(intercept_only_model, direction = 'both', scope = formula(total_model)) #Start: AIC=115.94 #mpg ~ 1 # # Df Sum of Sq RSS AIC #+ wt 1 847.73 278.32 73.217 #+ cyl 1 817.71 308.33 76.494 #+ disp 1 808.89 317.16 77.397 #+ hp 1 678.37 447.67 88.427 #+ drat 1 522.48 603.57 97.988 #+ vs 1 496.53 629.52 99.335 #+ am 1 405.15 720.90 103.672 #+ carb 1 341.78 784.27 106.369 #+ gear 1 259.75 866.30 109.552 #+ qsec 1 197.39 928.66 111.776 #<none> 1126.05 115.943

R determines that the model *mpg ~ wt *offers a statistically significant reduction in AIC compared to the intercept-only model, so *wt *is entered into the model. Thus, the model at this point is *mpg ~ wt*, which has an AIC of **73.22**.

Then, R fits every possible two-predictor model that includes *wt *as a predictor and shows the corresponding AIC. We can see that adding the predictor *cyl *produces the model with the lowest AIC at **63.198**. By comparison, adding no additional variables to the model (*<none>*) would simply keep the AIC at **73.22**. Thus, it makes sense to add *cyl *as a predictor into the model:

#Step: AIC=73.22 #mpg ~ wt # # Df Sum of Sq RSS AIC #+ cyl 1 87.15 191.17 63.198 #+ hp 1 83.27 195.05 63.840 #+ qsec 1 82.86 195.46 63.908 #+ vs 1 54.23 224.09 68.283 #+ carb 1 44.60 233.72 69.628 #+ disp 1 31.64 246.68 71.356 #<none> 278.32 73.217 #+ drat 1 9.08 269.24 74.156 #+ gear 1 1.14 277.19 75.086 #+ am 1 0.00 278.32 75.217 #- wt 1 847.73 1126.05 115.943

The model at this point is *mpg ~ wt + cyl*, which has an AIC of **63.198**. Next, R fits every possible three-predictor model that includes *wt *and *cyl *as predictors and shows the corresponding AIC. We can see that adding the predictor *hp *produces the model with the lowest AIC at **62.665**. By comparison, adding no additional variables to the model (*<none>*) would simply keep the AIC at **63.198**. Thus, it makes sense to add *hp *as a predictor into the model:

#mpg ~ wt + cyl # # Df Sum of Sq RSS AIC #+ hp 1 14.551 176.62 62.665 #+ carb 1 13.772 177.40 62.805 # 191.17 63.198 #+ qsec 1 10.567 180.60 63.378 #+ gear 1 3.028 188.14 64.687 #+ disp 1 2.680 188.49 64.746 #+ vs 1 0.706 190.47 65.080 #+ am 1 0.125 191.05 65.177 #+ drat 1 0.001 191.17 65.198 #- cyl 1 87.150 278.32 73.217 #- wt 1 117.162 308.33 76.494

The model at this point is *mpg ~ wt + cyl + hp*, which has an AIC of **62.665**. Next, R fits every possible four-predictor model that includes *wt,* *cyl, *and *hp *as predictors and shows the corresponding AIC. We can see that adding no additional variables to the model (*<none>*) produces the lowest AIC. Thus, we have arrived at our final model, which turns out to be *mpg ~ wt + cyl + hp**. *R tells us that the final fitted model is ** mpg ~ 38.75 – 3.17*wt – 0.94*cyl – 0.02*hyp**.

#Step: AIC=62.66 #mpg ~ wt + cyl + hp # # Df Sum of Sq RSS AIC # 176.62 62.665 #- hp 1 14.551 191.17 63.198 #+ am 1 6.623 170.00 63.442 #+ disp 1 6.176 170.44 63.526 #- cyl 1 18.427 195.05 63.840 #+ carb 1 2.519 174.10 64.205 #+ drat 1 2.245 174.38 64.255 #+ qsec 1 1.401 175.22 64.410 #+ gear 1 0.856 175.76 64.509 #+ vs 1 0.060 176.56 64.654 #- wt 1 115.354 291.98 76.750 # #Call: #lm(formula = mpg ~ wt + cyl + hp, data = mtcars) # #Coefficients: #(Intercept) wt cyl hp # 38.75179 -3.16697 -0.94162 -0.01804

**Cautions for Using Stepwise Regression**

There are several things to keep in mind when using stepwise regression, including:

- Stepwise regression leads to one single final model, but in reality there are often several equally good models that could be used.
- The final model is not guaranteed to be optimal in any specified way.
- It is possible that not all important predictors in the model have been included. It’s also possible that not all unimportant predictors have been excluded.
- The order in which the predictors are entered into the model should not be over-interpreted.

**Further Reading:
**

**Testing the Significance of a Regression Slope**

**How to Read and Interpret a Regression Table**

**A Guide to Multicollinearity in Regression**