**Simple linear regression **is a method we can use to understand the relationship between an explanatory variable, x, and a response variable, y.

This tutorial explains how to perform simple linear regression in Excel.

**Example: Simple Linear Regression in Excel**

Suppose we are interested in understanding the relationship between the number of hours a student studies for an exam and the exam score they receive.

To explore this relationship, we can perform simple linear regression using **hours studied** as an explanatory variable and **exam score **as a response variable.

Perform the following steps in Excel to conduct a simple linear regression.

**Step 1: Enter the data.**

Enter the following data for the number of hours studied and the exam score received for 20 students:

**Step 2: Visualize the data.**

Before we perform simple linear regression, it’s helpful to create a scatterplot of the data to make sure there actually exists a linear relationship between hours studied and exam score.

Highlight the data in columns A and B. Along the top ribbon in Excel go to the **Insert **tab. Within the **Charts** group, click **Insert Scatter (X, Y) **and click on the first option titled **Scatter**. This will automatically produce the following scatterplot:

The number of hours studied is shown on the x-axis and the exam scores are shown on the y-axis. We can see that there is a linear relationship between the two variables – more hours studied is associated with higher exam scores.

To quantify the relationship between these two variables, we can perform simple linear regression.

**Step 3: Perform simple linear regression.**

Along the top ribbon in Excel, go to the **Data** tab and click on **Data Analysis**. If you don’t see this option, then you need to first install the free Analysis ToolPak.

Once you click on **Data Analysis,** a new window will pop up. Select **Regression **and click OK.

For **Input Y Range**, fill in the array of values for the response variable. For **Input X Range**, fill in the array of values for the explanatory variable.

Check the box next to **Labels **so Excel knows that we included the variable names in the input ranges.

For **Output Range**, select a cell where you would like the output of the regression to appear.

Then click **OK**.

The following output will automatically appear:

**Step 4: Interpret the output.**

Here is how to interpret the most relevant numbers in the output:

**R Square: ****0.7273**. This is known as the coefficient of determination. It is the proportion of the variance in the response variable that can be explained by the explanatory variable. In this example, 72.73% of the variation in the exam scores can be explained by the number of hours studied.

**Standard error:** **5.2805**. This is the average distance that the observed values fall from the regression line. In this example, the observed values fall an average of 5.2805 units from the regression line.

**F: 47.9952**. This is the overall F statistic for the regression model, calculated as regression MS / residual MS.

**Significance F: 0.0000**. This is the p-value associated with the overall F statistic. It tells us whether or not the regression model is statistically significant. In other words, it tells us if the explanatory variable has a statistically significant association with the response variable. In this case the p-value is less than 0.05, which indicates that there is a statistically significant association between hours studied and exam score received.

**Coefficients: **The coefficients give us the numbers necessary to write the estimated regression equation. In this example the estimated regression equation is:

**exam score = 67.16 + 5.2503*(hours)**

We interpret the coefficient for hours to mean that for each additional hour studied, the exam score is expected to increase by **5.2503**, on average. We interpret the coefficient for the intercept to mean that the expected exam score for a student who studies zero hours is **67.16**.

We can use this estimated regression equation to calculate the expected exam score for a student, based on the number of hours they study.

For example, a student who studies for three hours is expected to receive an exam score of **82.91**:

exam score = 67.16 + 5.2503*(3) = 82.91

**Additional Resources**

The following tutorials explain how to perform other common tasks in Excel:

How to Create a Residual Plot in Excel

How to Construct a Prediction Interval in Excel

How to Create a Q-Q Plot in Excel

I am not sure why (or really how) in this case a significance level of .05 was chosen. I see the significance F was 0.0000 which is smaller than .05, but how was the level of .05 chosen?