How to Construct a Prediction Interval in Excel


In statistics, simple linear regression is a technique we can use to quantify the relationship between a predictor variable, x, and a response variable, y. 

When we conduct a simple linear regression, we obtain a “line of best fit” that describes the relationship between x and y, which can be written as:

ŷ = b0 + b1x

where:

  • ŷ is the predicted value of the response variable
  • b0 is the y-intercept
  • b1 is the regression coefficient
  • x is the value of the predictor variable

Sometimes we’re interested in using this line of best fit to construct a prediction interval for a given value of x0, which is an interval around the predicted value ŷ0 such that there is a 95% probability that the real value of y in the population corresponding to x0 is within this interval.

The formula to calculate the prediction interval for a given value x0 is written as:

ŷ0  +/-  tα/2,df=n-2 * s.e.

where:

s.e. = Syx√(1 + 1/n + (x0x)2/SSx)

The formula might look a bit intimidating, but it’s actually straightforward to calculate in Excel. Next, we’ll walk through an example of how to use this formula to calculate a prediction interval for a given value in Excel.

Example: How to Construct a Prediction Interval in Excel

The following dataset shows the number of hours studied along with the exam score received by 15 different students:

Example dataset in Excel

Suppose we would like to create a 95% prediction interval for the value x0 = 3. That is, we want to create an interval such that there is a 95% probability that the exam score is within this interval for a student who studies for 3 hours.

The following screenshot shows how to calculate all of the necessary values to obtain this prediction interval.

Note: The formulas in column show how the values in column were calculated.

How to calculate a prediction interval in Excel

The 95% prediction interval for a value of x0 = 3 is (74.64, 86.90). That is, we predict with 95% probability that a student who studies for 3 hours will earn a score between 74.64 and 86.90.

A couple notes on the calculations used:

  • To calculate the t-critical value of tα/2,df=n-2 we used α/2 = .05/2 = 0.25 since we wanted a 95% prediction interval. Note that higher prediction intervals (e.g. 99% prediction interval) will lead to wider intervals. Conversely, a lower prediction interval (e.g. 90% prediction interval) will lead to a more narrow interval.
  • We used the formula =FORECAST() to obtain the predicted value for ŷbut the formula =FORECAST.LINEAR() will return the exact same value.

Leave a Reply

Your email address will not be published.