The programming language R offers the following functions for fitting linear models:
1. lm – Used to fit linear models
This function uses the following syntax:
lm(formula, data, …)
where:
- formula: The formula for the linear model (e.g. y ~ x1 + x2)
- data: The name of the data frame that contains the data
2. glm – Used to fit generalized linear models
This function uses the following syntax:
glm(formula, family=gaussian, data, …)
where:
- formula: The formula for the linear model (e.g. y ~ x1 + x2)
- family: The statistical family to use to fit the model. Default is gaussian but other options include binomial, Gamma, and poisson among others.
- data: The name of the data frame that contains the data
Note that the only difference between these two functions is the family argument included in the glm() function.
If you use lm() or glm() to fit a linear regression model, they will produce the exact same results.
However, the glm() function can also be used to fit more complex models like:
- Logistic regression (family=binomial)
- Poisson regression (family=poisson)
The following examples show how to use the lm() function and glm() function in practice.
Example of Using the lm() Function
The following code shows how to fit a linear regression model using the lm() function:
#fit multiple linear regression model model <- lm(mpg ~ disp + hp, data=mtcars) #view model summary summary(model) Call: lm(formula = mpg ~ disp + hp, data = mtcars) Residuals: Min 1Q Median 3Q Max -4.7945 -2.3036 -0.8246 1.8582 6.9363 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 30.735904 1.331566 23.083 < 2e-16 *** disp -0.030346 0.007405 -4.098 0.000306 *** hp -0.024840 0.013385 -1.856 0.073679 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 3.127 on 29 degrees of freedom Multiple R-squared: 0.7482, Adjusted R-squared: 0.7309 F-statistic: 43.09 on 2 and 29 DF, p-value: 2.062e-09
Examples of Using the glm() Function
The following code shows how to fit the exact same linear regression model using the glm() function:
#fit multiple linear regression model model <- glm(mpg ~ disp + hp, data=mtcars) #view model summary summary(model) Call: glm(formula = mpg ~ disp + hp, data = mtcars) Deviance Residuals: Min 1Q Median 3Q Max -4.7945 -2.3036 -0.8246 1.8582 6.9363 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 30.735904 1.331566 23.083 < 2e-16 *** disp -0.030346 0.007405 -4.098 0.000306 *** hp -0.024840 0.013385 -1.856 0.073679 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for gaussian family taken to be 9.775636) Null deviance: 1126.05 on 31 degrees of freedom Residual deviance: 283.49 on 29 degrees of freedom AIC: 168.62 Number of Fisher Scoring iterations: 2
Notice that the coefficient estimates and standard errors of the coefficient estimates are the exact same as those produced by the lm() function.
Note that we can also use the glm() function to fit a logistic regression model by specifying family=binomial as follows:
#fit logistic regression model model <- glm(am ~ disp + hp, data=mtcars, family=binomial) #view model summary summary(model) Call: glm(formula = am ~ disp + hp, family = binomial, data = mtcars) Deviance Residuals: Min 1Q Median 3Q Max -1.9665 -0.3090 -0.0017 0.3934 1.3682 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.40342 1.36757 1.026 0.3048 disp -0.09518 0.04800 -1.983 0.0474 * hp 0.12170 0.06777 1.796 0.0725 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 43.230 on 31 degrees of freedom Residual deviance: 16.713 on 29 degrees of freedom AIC: 22.713 Number of Fisher Scoring iterations: 8
We can also use the glm() function to fit a Poisson regression model by specifying family=poisson as follows:
#fit Poisson regression model model <- glm(am ~ disp + hp, data=mtcars, family=poisson) #view model summary summary(model) Call: glm(formula = am ~ disp + hp, family = poisson, data = mtcars) Deviance Residuals: Min 1Q Median 3Q Max -1.1266 -0.4629 -0.2453 0.1797 1.5428 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.214255 0.593463 0.361 0.71808 disp -0.018915 0.007072 -2.674 0.00749 ** hp 0.016522 0.007163 2.307 0.02107 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 23.420 on 31 degrees of freedom Residual deviance: 10.526 on 29 degrees of freedom AIC: 42.526 Number of Fisher Scoring iterations: 6
Additional Resources
How to Perform Simple Linear Regression in R
How to Perform Multiple Linear Regression in R
How to Use the predict function with glm in R