# How to Calculate a Confidence Interval for a Regression Intercept

Simple linear regression is used to quantify the relationship between a predictor variable and a response variable.

This method finds a line that best “fits” a dataset and takes on the following form:

ŷ = b0 + b1x

where:

• ŷ: The estimated response value
• b0: The intercept of the regression line
• b1: The slope of the regression line
• x: The value of the predictor variable

Often we’re interested in the value for b1, which tells us the average change in the response variable associated with a one unit increase in the predictor variable.

However, in rare circumstances we’re also interested in the value for b0, which tells us the average value of the response variable when the predictor variable is equal to zero.

We can use the following formula to calculate a confidence interval for the value of β0, the true population intercept:

Confidence Interval for β0: b0 ± tα/2, n-2 * se(b0)

The following example shows how to calculate a confidence interval for an intercept in practice.

### Example: Confidence Interval for Regression Intercept

Suppose we’d like to fit a simple linear regression model using hours studied as a predictor variable and exam score as a response variable for 15 students in a particular class: The following code shows how to fit this simple linear regression model in R:

```#create data frame
df <- data.frame(hours=c(1, 2, 4, 5, 5, 6, 6, 7, 8, 10, 11, 11, 12, 12, 14),
score=c(64, 66, 76, 73, 74, 81, 83, 82, 80, 88, 84, 82, 91, 93, 89))

#fit simple linear regression model
fit <- lm(score ~ hours, data=df)

#view summary of model
summary(fit)

Call:
lm(formula = score ~ hours, data = df)

Residuals:
Min     1Q Median     3Q    Max
-5.140 -3.219 -1.193  2.816  5.772

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   65.334      2.106  31.023 1.41e-13 ***
hours          1.982      0.248   7.995 2.25e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.641 on 13 degrees of freedom
Multiple R-squared:  0.831,	Adjusted R-squared:  0.818
F-statistic: 63.91 on 1 and 13 DF,  p-value: 2.253e-06
```

Using the coefficient estimates in the output, we can write the fitted simple linear regression model as:

Score = 65.334 + 1.982*(Hours Studied)

The intercept value is 65.334. This tells us that the mean estimated exam score for a student who studies for zero hours is 65.334.

We can use the following formula to calculate a 95% confidence interval for the intercept:

• 95% C.I. for β0: b0 ± tα/2, n-2 * se(b0)
• 95% C.I. for β0: 65.334 ± t.05/2, 15-2 * 2.106
• 95% C.I. for β0: 65.334 ± 2.1604 * 2.106
• 95% C.I. for β0: [60.78, 69.88]

We interpret this to mean that we’re 95% confident that the true population mean exam score for students who study for zero hours is between 60.78 and 69.88.

Note: We used the Inverse t Distribution Calculator to find the t critical value that corresponds to a 95% confidence level with 13 degrees of freedom.

### Cautions on Calculating a Confidence Interval for a Regression Intercept

We often don’t calculate a confidence interval for a regression intercept in practice because it usually doesn’t make sense to interpret the value of the intercept in a regression model.

For example, suppose we fit a regression model that uses height of a basketball player as a predictor variable and average points per game as a response variable.

It’s not possible for a player to be zero feet tall, so it wouldn’t make sense to interpret the intercept literally in this model.

There are countless scenarios like this where a predictor variable can’t take on a value of zero so it doesn’t make sense to interpret the intercept value of the model or create a confidence interval for the intercept.

For example, consider the following potential predictor variables in a model:

• Square footage of a house
• Length of a car
• Weight of a person

Each of these predictor variables can’t take on a value of zero, so it wouldn’t make sense to calculate a confidence interval for the intercept of a regression model in any of these circumstances.