How to Perform a Bonferroni Outlier Test in R


Often you may want to check for outlier observations in a linear regression model.

This is important to do because outliers can affect the overall fit of the model and can cause problems when attempting to use the model to make predictions for the response values of unseen observations.

One common way to check for outliers in a regression model is to use the Bonferroni outlier test, which reports p-values for each observation in the dataset and give us an idea of which observations could potentially be outliers.

The easiest way to perform the Bonferroni outlier test in R is by using the outlierTest() function from the car package, which can be used to perform this exact task.

The outlierTest() function uses the following syntax:

outlierTest(model, cutoff=.05, …)

where:

  • model: A linear regression model fit using the lm() function
  • cutoff: Observations with Bonferroni p-values exceeding this value are not reported, unless no observations are nominated, in which case the one with the largest Studentized residual is reported

Note that you can adjust the cutoff value if you would like to change the requirement for what is considered to be an outlier. The default value is .05.

The following example shows how to use the outlierTest() function in practice in R.

Example: How to Perform a Bonferroni Outlier Test in R

For this particular example we will fit a multiple linear regression model using the built-in mtcars dataset in R.

We can use the head() function to view the first few rows from this dataset:

#view head of mtcars dataset
head(mtcars)

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

The dataset contains various measurements for different cars.

Suppose that we would like to fit a multiple linear regression model using disp and carb as the predictor variables to predict the value of mpg (miles per gallon) of each car in the dataset.

We can use the following syntax to fit this regression model and view the model summary:

#fit first regression model
fit <- lm(mpg ~ disp + carb, data = mtcars)

#view model summary
summary(fit)

Call:
lm(formula = mpg ~ disp + carb, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.3379 -2.0849 -0.3448  1.5118  6.2836 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 31.152710   1.263620  24.654  < 2e-16 ***
disp        -0.036296   0.004676  -7.762 1.47e-08 ***
carb        -0.955677   0.358789  -2.664   0.0125 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.964 on 29 degrees of freedom
Multiple R-squared:  0.7737,    Adjusted R-squared:  0.7581 
F-statistic: 49.58 on 2 and 29 DF,  p-value: 4.393e-10

Now suppose that we would like to perform a Bonferroni outlier test to check if any of the observations in the original dataset are considered to be outliers when used in the regression model.

We can use the following syntax with the outlierTest() function do so:

library(car)

#perform Bonferroni outlier test
outlierTest(fit)
No Studentized residuals with Bonferroni p < 0.05
Largest |rstudent|:
               rstudent unadjusted p-value Bonferroni p
Toyota Corolla 2.411735           0.022681      0.72579

The output tells us that there are No Studentized residuals with Bonferroni p < 0.05.

This tells us that there are no outliers in this regression model.

The outlierTest() function then returns the observation with the highest studentized residual.

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Sort a Table in R
How to Plot a Table in R
How to Create a Three-Way Table in R

Leave a Reply

Your email address will not be published. Required fields are marked *