How to Calculate Confidence Interval for Regression Coefficient in R


In a linear regression model, a regression coefficient tells us the average change in the response variable associated with a one unit increase in the predictor variable.

We can use the following formula to calculate a confidence interval for a regression coefficient:

Confidence Interval for β1: b1 ± t1-α/2, n-2 * se(b1)

where:

  •  b1 = Regression coefficient shown in the regression table
  • t1-∝/2, n-2 = The t critical value for confidence level 1-∝ with n-2 degrees of freedom where is the total number of observations in our dataset
  • se(b1) = The standard error of b1 shown in the regression table

The following example shows how to calculate a confidence interval for a regression slope in practice.

Example: Confidence Interval for Regression Coefficient in R

Suppose we’d like to fit a simple linear regression model using hours studied as a predictor variable and exam score as a response variable for 15 students in a particular class:

We can use the lm() function to fit this simple linear regression model in R:

#create data frame
df <- data.frame(hours=c(1, 2, 4, 5, 5, 6, 6, 7, 8, 10, 11, 11, 12, 12, 14),
                 score=c(64, 66, 76, 73, 74, 81, 83, 82, 80, 88, 84, 82, 91, 93, 89))

#fit linear regression model
fit <- lm(score ~ hours, data=df)

#view model summary
summary(fit)

Call:
lm(formula = score ~ hours, data = df)

Residuals:
   Min     1Q Median     3Q    Max 
-5.140 -3.219 -1.193  2.816  5.772 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   65.334      2.106  31.023 1.41e-13 ***
hours          1.982      0.248   7.995 2.25e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.641 on 13 degrees of freedom
Multiple R-squared:  0.831,	Adjusted R-squared:  0.818 
F-statistic: 63.91 on 1 and 13 DF,  p-value: 2.253e-06

Using the coefficient estimates in the output, we can write the fitted simple linear regression model as:

Score = 65.334 + 1.982*(Hours Studied)

Notice that the regression coefficient for hours is 1.982.

This tells us that each additional one hour increase in studying is associated with an average increase of 1.982 in exam score.

We can use the confint() function to calculate a 95% confidence interval for the regression coefficient:

#calculate confidence interval for regression coefficient for 'hours'
confint(fit, 'hours', level=0.95)

         2.5 %   97.5 %
hours 1.446682 2.518068

The 95% confidence interval for the regression coefficient is [1.446, 2.518].

Since this confidence interval doesn’t contain the value 0, we can conclude that there is a statistically significant association between hours studied and exam score.

We can also confirm this is correct by calculating the 95% confidence interval for the regression coefficient by hand:

  • 95% C.I. for β1: b1 ± t1-α/2, n-2 * se(b1)
  • 95% C.I. for β1: 1.982 ± t.975, 15-2 * .248
  • 95% C.I. for β1: 1.982 ± 2.1604 * .248
  • 95% C.I. for β1: [1.446, 2.518]

The 95% confidence interval for the regression coefficient is [1.446, 2.518].

Note #1: We used the Inverse t Distribution Calculator to find the t critical value that corresponds to a 95% confidence level with 13 degrees of freedom.

Note #2: To calculate a confidence interval with a different confidence level, simply change the value for the level argument in the confint() function.

Additional Resources

The following tutorials provide additional information about linear regression in R:

How to Interpret Regression Output in R
How to Perform Simple Linear Regression in R
How to Perform Multiple Linear Regression in R
How to Perform Logistic Regression in R

Leave a Reply

Your email address will not be published.