Simple linear regression is used to quantify the relationship between a predictor variable and a response variable.
This method finds a line that best “fits” a dataset and takes on the following form:
ŷ = b0 + b1x
- ŷ: The estimated response value
- b0: The intercept of the regression line
- b1: The slope of the regression line
- x: The value of the predictor variable
Often we’re interested in the value for b1, which tells us the average change in the response variable associated with a one unit increase in the predictor variable.
However, in rare circumstances we’re also interested in the value for b0, which tells us the average value of the response variable when the predictor variable is equal to zero.
We can use the following formula to calculate a confidence interval for the value of β0, the true population intercept:
Confidence Interval for β0: b0 ± tα/2, n-2 * se(b0)
The following example shows how to calculate a confidence interval for an intercept in practice.
Example: Confidence Interval for Regression Intercept
Suppose we’d like to fit a simple linear regression model using hours studied as a predictor variable and exam score as a response variable for 15 students in a particular class:
The following code shows how to fit this simple linear regression model in R:
#create data frame df <- data.frame(hours=c(1, 2, 4, 5, 5, 6, 6, 7, 8, 10, 11, 11, 12, 12, 14), score=c(64, 66, 76, 73, 74, 81, 83, 82, 80, 88, 84, 82, 91, 93, 89)) #fit simple linear regression model fit <- lm(score ~ hours, data=df) #view summary of model summary(fit) Call: lm(formula = score ~ hours, data = df) Residuals: Min 1Q Median 3Q Max -5.140 -3.219 -1.193 2.816 5.772 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 65.334 2.106 31.023 1.41e-13 *** hours 1.982 0.248 7.995 2.25e-06 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 3.641 on 13 degrees of freedom Multiple R-squared: 0.831, Adjusted R-squared: 0.818 F-statistic: 63.91 on 1 and 13 DF, p-value: 2.253e-06
Using the coefficient estimates in the output, we can write the fitted simple linear regression model as:
Score = 65.334 + 1.982*(Hours Studied)
The intercept value is 65.334. This tells us that the mean estimated exam score for a student who studies for zero hours is 65.334.
We can use the following formula to calculate a 95% confidence interval for the intercept:
- 95% C.I. for β0: b0 ± tα/2, n-2 * se(b0)
- 95% C.I. for β0: 65.334 ± t.05/2, 15-2 * 2.106
- 95% C.I. for β0: 65.334 ± 2.1604 * 2.106
- 95% C.I. for β0: [60.78, 69.88]
We interpret this to mean that we’re 95% confident that the true population mean exam score for students who study for zero hours is between 60.78 and 69.88.
Note: We used the Inverse t Distribution Calculator to find the t critical value that corresponds to a 95% confidence level with 13 degrees of freedom.
Cautions on Calculating a Confidence Interval for a Regression Intercept
We often don’t calculate a confidence interval for a regression intercept in practice because it usually doesn’t make sense to interpret the value of the intercept in a regression model.
For example, suppose we fit a regression model that uses height of a basketball player as a predictor variable and average points per game as a response variable.
It’s not possible for a player to be zero feet tall, so it wouldn’t make sense to interpret the intercept literally in this model.
There are countless scenarios like this where a predictor variable can’t take on a value of zero so it doesn’t make sense to interpret the intercept value of the model or create a confidence interval for the intercept.
For example, consider the following potential predictor variables in a model:
- Square footage of a house
- Length of a car
- Weight of a person
Each of these predictor variables can’t take on a value of zero, so it wouldn’t make sense to calculate a confidence interval for the intercept of a regression model in any of these circumstances.
The following tutorials provide additional information about linear regression: