Simple linear regression is a method we can use to understand the relationship between an explanatory variable, x, and a response variable, y.
This tutorial explains how to perform simple linear regression in Excel.
Example: Simple Linear Regression in Excel
Suppose we are interested in understanding the relationship between the number of hours a student studies for an exam and the exam score they receive.
To explore this relationship, we can perform simple linear regression using hours studied as an explanatory variable and exam score as a response variable.
Perform the following steps in Excel to conduct a simple linear regression.
Step 1: Enter the data.
Enter the following data for the number of hours studied and the exam score received for 20 students:
Step 2: Visualize the data.
Before we perform simple linear regression, it’s helpful to create a scatterplot of the data to make sure there actually exists a linear relationship between hours studied and exam score.
Highlight the data in columns A and B. Along the top ribbon in Excel go to the Insert tab. Within the Charts group, click Insert Scatter (X, Y) and click on the first option titled Scatter. This will automatically produce the following scatterplot:
The number of hours studied is shown on the x-axis and the exam scores are shown on the y-axis. We can see that there is a linear relationship between the two variables – more hours studied is associated with higher exam scores.
To quantify the relationship between these two variables, we can perform simple linear regression.
Step 3: Perform simple linear regression.
Along the top ribbon in Excel, go to the Data tab and click on Data Analysis. If you don’t see this option, then you need to first install the free Analysis ToolPak.
Once you click on Data Analysis, a new window will pop up. Select Regression and click OK.
For Input Y Range, fill in the array of values for the response variable. For Input X Range, fill in the array of values for the explanatory variable.
Check the box next to Labels so Excel knows that we included the variable names in the input ranges.
For Output Range, select a cell where you would like the output of the regression to appear.
Then click OK.
The following output will automatically appear:
Step 4: Interpret the output.
Here is how to interpret the most relevant numbers in the output:
R Square: 0.7273. This is known as the coefficient of determination. It is the proportion of the variance in the response variable that can be explained by the explanatory variable. In this example, 72.73% of the variation in the exam scores can be explained by the number of hours studied.
Standard error: 5.2805. This is the average distance that the observed values fall from the regression line. In this example, the observed values fall an average of 5.2805 units from the regression line.
F: 47.9952. This is the overall F statistic for the regression model, calculated as regression MS / residual MS.
Significance F: 0.0000. This is the p-value associated with the overall F statistic. It tells us whether or not the regression model is statistically significant. In other words, it tells us if the explanatory variable has a statistically significant association with the response variable. In this case the p-value is less than 0.05, which indicates that there is a statistically significant association between hours studied and exam score received.
Coefficients: The coefficients give us the numbers necessary to write the estimated regression equation. In this example the estimated regression equation is:
exam score = 67.16 + 5.2503*(hours)
We interpret the coefficient for hours to mean that for each additional hour studied, the exam score is expected to increase by 5.2503, on average. We interpret the coefficient for the intercept to mean that the expected exam score for a student who studies zero hours is 67.16.
We can use this estimated regression equation to calculate the expected exam score for a student, based on the number of hours they study.
For example, a student who studies for three hours is expected to receive an exam score of 82.91:
exam score = 67.16 + 5.2503*(3) = 82.91
The following tutorials explain how to perform other common tasks in Excel:
How to Create a Residual Plot in Excel
How to Construct a Prediction Interval in Excel
How to Create a Q-Q Plot in Excel