To fit a linear regression model in R, we can use the lm() command.
To view the output of the regression model, we can then use the summary() command.
This tutorial explains how to interpret every value in the regression output in R.
Example: Interpreting Regression Output in R
The following code shows how to fit a multiple linear regression model with the built-in mtcars dataset using hp, drat, and wt as predictor variables and mpg as the response variable:
#fit regression model using hp, drat, and wt as predictors model <- lm(mpg ~ hp + drat + wt, data = mtcars) #view model summary summary(model) Call: lm(formula = mpg ~ hp + drat + wt, data = mtcars) Residuals: Min 1Q Median 3Q Max -3.3598 -1.8374 -0.5099 0.9681 5.7078 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 29.394934 6.156303 4.775 5.13e-05 *** hp -0.032230 0.008925 -3.611 0.001178 ** drat 1.615049 1.226983 1.316 0.198755 wt -3.227954 0.796398 -4.053 0.000364 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.561 on 28 degrees of freedom Multiple R-squared: 0.8369, Adjusted R-squared: 0.8194 F-statistic: 47.88 on 3 and 28 DF, p-value: 3.768e-11
Here is how to interpret every value in the output:
Call: lm(formula = mpg ~ hp + drat + wt, data = mtcars)
This section reminds us of the formula that we used in our regression model. We can see that we used mpg as the response variable and hp, drat, and wt as our predictor variables. Each variable came from the dataset called mtcars.
Residuals: Min 1Q Median 3Q Max -3.3598 -1.8374 -0.5099 0.9681 5.7078
This section displays a summary of the distribution of residuals from the regression model. Recall that a residual is the difference between the observed value and the predicted value from the regression model.
The minimum residual was -3.3598, the median residual was -0.5099 and the max residual was 5.7078.
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 29.394934 6.156303 4.775 5.13e-05 *** hp -0.032230 0.008925 -3.611 0.001178 ** drat 1.615049 1.226983 1.316 0.198755 wt -3.227954 0.796398 -4.053 0.000364 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
This section displays the estimated coefficients of the regression model. We can use these coefficients to form the following estimated regression equation:
mpg = 29.39 – .03*hp + 1.62*drat – 3.23*wt
For each predictor variable, we’re given the following values:
Estimate: The estimated coefficient. This tells us the average increase in the response variable associated with a one unit increase in the predictor variable, assuming all other predictor variables are held constant.
Std. Error: This is the standard error of the coefficient. This is a measure of the uncertainty in our estimate of the coefficient.
t value: This is the t-statistic for the predictor variable, calculated as (Estimate) / (Std. Error).
Pr(>|t|): This is the p-value that corresponds to the t-statistic. If this value is less than some alpha level (e.g. 0.05) than the predictor variable is said to be statistically significant.
If we used an alpha level of α = .05 to determine which predictors were significant in this regression model, we’d say that hp and wt are statistically significant predictors while drat is not.
Assessing Model Fit
Residual standard error: 2.561 on 28 degrees of freedom Multiple R-squared: 0.8369, Adjusted R-squared: 0.8194 F-statistic: 47.88 on 3 and 28 DF, p-value: 3.768e-11
This last section displays various numbers that help us assess how well the regression model fits our dataset.
Residual standard error: This tells us the average distance that the observed values fall from the regression line. The smaller the value, the better the regression model is able to fit the data.
The degrees of freedom is calculated as n-k-1 where n = total observations and k = number of predictors. In this example, mtcars has 32 observations and we used 3 predictors in the regression model, thus the degrees of freedom is 32 – 3 – 1 = 28.
Multiple R-Squared: This is known as the coefficient of determination. It tells us the proportion of the variance in the response variable that can be explained by the predictor variables.
This value ranges from 0 to 1. The closer it is to 1, the better the predictor variables are able to predict the value of the response variable.
Adjusted R-squared: Ths is a modified version of R-squared that has been adjusted for the number of predictors in the model. It is always lower than the R-squared.
The adjusted R-squared can be useful for comparing the fit of different regression models that use different numbers of predictor variables.
F-statistic: This indicates whether the regression model provides a better fit to the data than a model that contains no independent variables. In essence, it tests if the regression model as a whole is useful.
p-value: This is the p-value that corresponds to the F-statistic. If this value is less than some significance level (e.g. 0.05), then the regression model fits the data better than a model with no predictors.
When building regression models, we hope that this p-value is less than some significance level because it indicates that the predictor variables are actually useful for predicting the value of the response variable.