Linear regression is a method we can use to quantify the relationship between one or more predictor variables and a response variable.

When we use a categorical variable as a predictor variable in the model, the coefficients shown in the output of the model show the average difference in the response variable, relative to a specific level of the categorical variable.

By default, R will choose the level to be used as the baseline upon which all other levels are compared.

However, sometimes you may want to specify which level of the categorical variable should be used as the baseline.

You can use the **relevel()** function in R to do so, which uses the following basic syntax:

**relevel(x, ref)**

where:

**x**: An unordered factor**ref**: The reference level, typically expressed as a string

The following example shows how to use the **relevel()** function with a linear regression model in practice in R.

**Example: How to Use the relevel() Function in R**

Suppose we have the following data frame in R that contains information on three variables for 12 different basketball players:

- points scored
- hours spent practicing
- training program used

#create data frame df <- data.frame(points=c(7, 7, 9, 10, 13, 14, 12, 10, 16, 19, 22, 18), hours=c(1, 2, 2, 3, 2, 6, 4, 3, 4, 5, 8, 6), program=c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3)) #view data frame df points hours program 1 7 1 1 2 7 2 1 3 9 2 1 4 10 3 1 5 13 2 2 6 14 6 2 7 12 4 2 8 10 3 2 9 16 4 3 10 19 5 3 11 22 8 3 12 18 6 3

Suppose we would like to fit the following linear regression model:

**points = β _{0} + β_{1}hours + β_{2}program**

In this example, program is a **categorical variable** that can take on three possible categories: program 1, program 2, or program 3.

We can use the following syntax to fit this regression model:

#convert 'program' to factor df$program <- as.factor(df$program) #fit linear regression model fit <- lm(points ~ hours + program, data = df) #view model summary summary(fit) Call: lm(formula = points ~ hours + program, data = df) Residuals: Min 1Q Median 3Q Max -1.5192 -1.0064 -0.3590 0.8269 2.4551 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.3013 0.9462 6.660 0.000159 *** hours 0.9744 0.3176 3.068 0.015401 * program2 2.2949 1.1369 2.019 0.078234 . program3 6.8462 1.5499 4.417 0.002235 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1.403 on 8 degrees of freedom Multiple R-squared: 0.9392, Adjusted R-squared: 0.9164 F-statistic: 41.21 on 3 and 8 DF, p-value: 3.276e-05

The values in the **Estimate** column of the regression table show the coefficients for both **program2** and **program3**, which means that **program1** was used as the baseline level for the **program** variable.

However, suppose that we would like program3 to be the baseline level used.

We can use the following syntax with the **relevel()** function to set program3 as the baseline level and then fit the linear regression model one more time:

#convert 'program' to factor df$program <- as.factor(df$program) #specify that program3 should be used as baseline level df$program <- relevel(df$program, ref='3') #fit linear regression model fit <- lm(points ~ hours + program, data = df) #view model summary summary(fit) Call: lm(formula = points ~ hours + program, data = df) Residuals: Min 1Q Median 3Q Max -1.5192 -1.0064 -0.3590 0.8269 2.4551 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 13.1474 1.9563 6.721 0.00015 *** hours 0.9744 0.3176 3.068 0.01540 * program1 -6.8462 1.5499 -4.417 0.00223 ** program2 -4.5513 1.1777 -3.864 0.00478 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1.403 on 8 degrees of freedom Multiple R-squared: 0.9392, Adjusted R-squared: 0.9164 F-statistic: 41.21 on 3 and 8 DF, p-value: 3.276e-05

Notice that the **Estimate** column of the regression table now shows the coefficients for **program1** and **program2**, which means that **program3** was used as the baseline level for the **program** variable.

**Additional Resources**

The following tutorials explain how to perform other common tasks in R:

How to Perform Simple Linear Regression in R

How to Perform Multiple Linear Regression in R

How to Create a Residual Plot in R