Simple linear regression is a method you can use to understand the relationship between an explanatory variable, x, and a response variable, y.
This tutorial explains how to perform simple linear regression in Stata.
Example: Simple Linear Regression in Stata
Suppose we are interested in understanding the relationship between the weight of a car and its miles per gallon. To explore this relationship, we can perform simple linear regression using weight as an explanatory variable and miles per gallon as a response variable.
Perform the following steps in Stata to conduct a simple linear regression using the dataset called auto, which contains data on 74 different cars.
Step 1: Load the data.
Load the data by typing the following into the Command box:
Step 2: Get a summary of the data.
Gain a quick understanding of the data you’re working with by typing the following into the Command box:
We can see that there are 12 different variables in the dataset, but the only two that we care about are mpg and weight.
Step 3: Visualize the data.
Before we perform simple linear regression, let’s first create a scatterplot of weight vs. mpg so we can visualize the relationship between these two variables and check for any obvious outliers. Type the following into the Command box to create a scatterplot:
scatter mpg weight
This produces the following scatterplot:
We can see that cars with higher weights tend to have lower miles per gallon. To quantify this relationship, we will now perform a simple linear regression.
Step 4: Perform simple linear regression.
Type the following into the Command box to perform a simple linear regression using weight as an explanatory variable and mpg as a response variable.
regress mpg weight
Here is how to interpret the most interesting numbers in the output:
R-squared: 0.6515. This is the proportion of the variance in the response variable that can be explained by the explanatory variable. In this example, 65.15% of the variation in mpg can be explained by weight.
Coef (weight): -0.006. This tells us the average change in the response variable associated with a one unit increase in the explanatory variable. In this example, each one pound increase in weight is associated with a decrease of 0.006 in mpg, on average.
Coef (_cons): 39.44028. This tells us the average value of the response variable when the explanatory variable is zero. In this example, the average mpg is 39.44028 when the weight of a car is zero. This doesn’t actually make much sense to interpret since the weight of a car can’t be zero, but the number 39.44028 is needed to form a regression equation.
P>|t| (weight): 0.000. This is the p-value associated with the test statistic for weight. In this case, since this value is less than 0.05, we can conclude that there is a statistically significant relationship between weight and mpg.
Regression Equation: Lastly, we can form a regression equation using the two coefficient values. In this case, the equation would be:
predicted mpg = 39.44028 – 0.0060087*(weight)
We can use this equation to find the predicted mpg for a car, given its weight. For example, a car that weighs 4,000 pounds is predicted to have mpg of 15.405:
predicted mpg = 39.44028 – 0.0060087*(4000) = 15.405
Step 5: Report the results.
Lastly, we want to report the results of our simple linear regression. Here is an example of how to do so:
A linear regression was performed to quantify the relationship between the weight of a car and its miles per gallon. A sample of 74 cars was used in the analysis.
Results showed that there was a statistically significant relationship between weight and mpg (t = -11.60, p < 0.0001) and weight accounted for 65.15% of explained variability in mpg.
The regression equation was found to be:
predicted mpg = 39.44 – 0.006(weight)
Each additional pound was associated with a decrease, on average, of -.006 miles per gallon.