A residual plot is a type of plot that displays the values of a predictor variable in a regression model along the x-axis and the values of the residuals along the y-axis.
This plot is used to assess whether or not the residuals in a regression model are normally distributed and whether or not they exhibit heteroscedasticity.
The following step-by-step example shows how to create a residual plot for a regression model by hand.
Step 1: Find the Predicted Values
Suppose we want to fit a regression model to the following dataset:
Using statistical software (like Excel, R, Python, SPSS, etc.) we can find that the fitted regression model is:
y = 10.4486 + 1.3037(x)
We can then use this model to predict the value of y, based on the value of x. For example, if x = 3, then we would predict y to be:
y = 10.4486 + 1.3037(3) = 14.359
We can repeat this process for every observation in our dataset:
Step 2: Find the Residuals
A residual for a given observation in our dataset is calculated as:
Residual = observed value – predicted value
For example, the residual of the first observation would be calculated as:
Residual = 15 – 14.359 = 0.641
We can repeat this process for every observation in our dataset:
Step 3: Create the Residual Plot
Lastly, we can create a residual plot by placing the x values along the x-axis and the residual values along the y-axis.
For example, the first point we’ll place in our plot is (3, 0.641)
The next point we’ll place in our plot is (5, 0.033)
We’ll continue until we’ve placed all 10 pairwise combinations of x values and residual values in the plot:
Any point above zero in the plot represents a positive residual. This means the observed value for y is greater than the value predicted by the regression model.
Any point below zero represents a negative residual. This means the observed value for y is less than the value predicted by the regression model.
Since the points in the plot are randomly scattered around a residual value of 0 with no clear pattern, this indicates that the relationship between x and y is linear and a linear regression model is appropriate to use.
And since the residuals don’t systematically increases or decrease as the predictor variable gets larger, this means heteroskedasticity is not a problem with this regression model.
Additional Resources
The following tutorials explain how to create residual plots using different statistical software:
How to Create a Residual Plot on a TI-84 Calculator
How to Create a Residual Plot in Excel
How to Create a Residual Plot in R
How to Create a Residual Plot in Python