How to Create and Interpet Q-Q Plots in Stata


Q-Q plot, short for “quantile-quantile” plot, is often used to assess whether or not the residuals in a regression analysis are normally distributed.

This tutorial explains how to create and interpret a Q-Q plot in Stata.

Example: Q-Q Plot in Stata

For this example we will use the built-in auto dataset in Stata. We will fit a multiple linear regression model, using mpg and displacement as the explanatory variables and price as the response variable. We will then obtain the residuals for the model and create a Q-Q plot to see if the residuals following a normal distribution.

Step 1: Load and view the data.

First, we’ll load the data using the following command:

sysuse auto

Next, we’ll get a quick summary of the data using the following command:

summarize

Summarize command in Stata

Step 2: Fit the regression model.

Next, we’ll use the following command to fit the regression model:

regress price mpg displacement

Regression model output in Stata

Step 3: Calculate the residuals.

Recall that a residual is simply the difference between the predicted response value (as calculated by the estimated regression equation) and the actual response value.

We can obtain the residuals of each prediction by using the residuals command and storing these values in a variable named whatever we’d like. In this case, we’ll use the name resid_price:

predict resid_price, residuals

Step 4: Create the Q-Q Plot.

Now that we have a list of residuals, we can create a Q-Q plot using the qnorm command:

qnorm resid_price

Q-Q plot in Stata

Step 5: Interpret the Q-Q Plot.

The idea behind a Q-Q plot is simple: if the residuals fall along a roughly straight line at a 45-degree angle, then the residuals are roughly normally distributed. We can see in our Q-Q plot above that the residuals tend to deviate from the 45-degree line quite a bit, especially on the tail ends, which could be an indication that they’re not normally distributed.

Although a Q-Q plot isn’t a formal statistical test, it offers an easy way to visually check whether or not the residuals are normally distributed.

If it turns out that your residuals deviate severely from the 45-degree line in the Q-Q plot, you may consider performing a transformation on the response variable in your regression, such as using the square root or the log of the response variable.

If the residuals only deviate slightly, you don’t need to worry about transforming the response variable as regression is fairly robust to departures from normality.

Leave a Reply

Your email address will not be published. Required fields are marked *