Two of the most commonly used regression models are linear regression and logistic regression.
Both types of regression models are used to quantify the relationship between one or more predictor variables and a response variable, but there are some key differences between the two models:
Here’s a summary of the differences:
Difference #1: Type of Response Variable
A linear regression model is used when the response variable takes on a continuous value such as:
Conversely, a logistic regression model is used when the response variable takes on a categorical value such as:
- Yes or No
- Male or Female
- Win or Not Win
Difference #2: Equation Used
Linear regression uses the following equation to summarize the relationship between the predictor variable(s) and the response variable:
Y = β0 + β1X1 + β2X2 + … + βpXp
- Y: The response variable
- Xj: The jth predictor variable
- βj: The average effect on Y of a one unit increase in Xj, holding all other predictors fixed
Conversely, logistic regression uses the following equation:
p(X) = eβ0 + β1X1 + β2X2 + … + βpXp / (1 + eβ0 + β1X1 + β2X2 + … + βpXp)
This equation is used to predict the probability that an individual observation falls into a certain category.
Difference #3: Method Used to Fit Equation
Linear regression uses a method known as ordinary least squares to find the best fitting regression equation.
Conversely, logistic regression uses a method known as maximum likelihood estimation to find the best fitting regression equation.
Difference #4: Output to Predict
Linear regression predicts a continuous value as the output. For example:
- Price ($150, $199, $400, etc.)
- Height (14 inches, 2 feet, 94.32 centimeters, etc.)
- Age (2 months, 6 years, 41.5 years, etc.)
- Distance (1.23 miles, 4.5 kilometers, etc.)
Conversely, logistic regression predicts probabilities as the output. For example:
- 40.3% chance of getting accepted to a university.
- 93.2% chance of winning a game.
- 34.2% chance of a law getting passed.
When to Use Logistic vs. Linear Regression
The following practice problems can help you gain a better understanding of when to use logistic regression or linear regression.
Problem #1: Annual Income
Suppose an economist wants to use predictor variables (1) weekly hours worked and (2) years of education to predict the annual income of individuals.
In this scenario, he would use linear regression because the response variable (annual income) is continuous.
Problem #2: University Acceptance
Suppose a college admissions officer wants to use the predictor variables (1) GPA and (2) ACT score to predict the probability that a student will get accepted into a certain university.
In this scenario, she would use logistic regression because the response variable is categorial and can only take on two values – accepted or not accepted.
Problem #3: Home Price
Suppose a real estate agent wants to use the predictor variables (1) square footage, (2) number of bedrooms, and (3) number of bathrooms to predict the selling house of prices.
In this scenario, she would use linear regression because the response variable (price) is continuous.
Problem #4: Spam Detection
Suppose a computer programmer wants to use the predictor variables (1) number of words and (2) country of origin to predict the probability that a given email is spam.
In this scenario, he would use logistic regression because the response variable is categorical and can only take on two values – spam or not spam.
The following tutorials offer more details on linear regression:
- Introduction to Simple Linear Regression
- Introduction to Multiple Linear Regression
- 4 Examples of Using Linear Regression in Real Life
The following tutorials offer more details on logistic regression: