How to Perform Robust Regression in R (Step-by-Step)

Robust regression is a method we can use as an alternative to ordinary least squares regression when there are outliers or influential observations in the dataset we’re working with.

To perform robust regression in R, we can use the rlm() function from the MASS package, which uses the following syntax:

The following step-by-step example shows how to perform robust regression in R for a given dataset.

Step 1: Create the Data

First, let’s create a fake dataset to work with:

#create data
df <- data.frame(x1=c(1, 3, 3, 4, 4, 6, 6, 8, 9, 3,
                      11, 16, 16, 18, 19, 20, 23, 23, 24, 25),
                 x2=c(7, 7, 4, 29, 13, 34, 17, 19, 20, 12,
                      25, 26, 26, 26, 27, 29, 30, 31, 31, 32),
                  y=c(17, 170, 19, 194, 24, 2, 25, 29, 30, 32,
                      44, 60, 61, 63, 63, 64, 61, 67, 59, 70))

#view first six rows of data

  x1 x2   y
1  1  7  17
2  3  7 170
3  3  4  19
4  4 29 194
5  4 13  24
6  6 34   2

Step 2: Perform Ordinary Least Squares Regression

Next, let’s fit an ordinary least squares regression model and create a plot of the standardized residuals.

In practice, we often consider any standardized residual with an absolute value greater than 3 to be an outlier.

#fit ordinary least squares regression model
ols <- lm(y~x1+x2, data=df)

#create plot of y-values vs. standardized residuals
plot(df$y, rstandard(ols), ylab='Standardized Residuals', xlab='y') 

From the plot we can see that there are two observations with standardized residuals around 3.

This is an indication that there are two potential outliers in the dataset and thus we may benefit from performing robust regression instead.

Step 3: Perform Robust Regression

Next, let’s use the rlm() function to fit a robust regression model:


#fit robust regression model
robust <- rlm(y~x1+x2, data=df)

To determine if this robust regression model offers a better fit to the data compared to the OLS model, we can calculate the residual standard error of each model.

The residual standard error (RSE) is a way to measure the standard deviation of the residuals in a regression model. The lower the value for RSE, the more closely a model is able to fit the data.

The following code shows how to calculate the RSE for each model:

#find residual standard error of ols model

[1] 49.41848

#find residual standard error of ols model

[1] 9.369349

We can see that the RSE for the robust regression model is much lower than the ordinary least squares regression model, which tells us that the robust regression model offers a better fit to the data.

Additional Resources

How to Perform Simple Linear Regression in R
How to Perform Multiple Linear Regression in R
How to Perform Polynomial Regression in R

Leave a Reply

Your email address will not be published. Required fields are marked *