One of the key assumptions in linear regression is that there is no correlation between the residuals, e.g. the residuals are independent.
One way to determine if this assumption is met is to perform a Durbin-Watson test, which is used to detect the presence of autocorrelation in the residuals of a regression. This test uses the following hypotheses:
H0 (null hypothesis): There is no correlation among the residuals.
HA (alternative hypothesis): The residuals are autocorrelated.
This tutorial explains how to perform a Durbin-Watson test in R.
Example: Durbin-Watson Test in R
To perform a Durbin-Watson test, we first need to fit a linear regression model. We will use the built-in R dataset mtcars and fit a regression model using mpg as the predictor variable and disp and wt as explanatory variables.
#load mtcars dataset data(mtcars) #view first six rows of dataset head(mtcars) mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 #fit regression model model <- lm(mpg ~ disp+wt, data=mtcars)
Next, we can perform a Durbin-Watson test using the durbinWatsonTest() function from the car package:
#load car package library(car) #perform Durbin-Watson test durbinWatsonTest(model) Loading required package: carData lag Autocorrelation D-W Statistic p-value 1 0.341622 1.276569 0.034 Alternative hypothesis: rho != 0
From the output we can see that the test statistic is 1.276569 and the corresponding p-value is 0.034. Since this p-value is less than 0.05, we can reject the null hypothesis and conclude that the residuals in this regression model are autocorrelated.
What to Do if Autocorrelation is Detected
If you reject the null hypothesis and conclude that autocorrelation is present in the residuals, then you have a few different options to correct this problem if you deem it to be serious enough:
- For positive serial correlation, consider adding lags of the dependent and/or independent variable to the model.
- For negative serial correlation, check to make sure that none of your variables are overdifferenced.
- For seasonal correlation, consider adding seasonal dummy variables to the model.