Logistic Regression vs. Linear Regression: The Key Differences


Two of the most commonly used regression models are linear regression and logistic regression.

Both types of regression models are used to quantify the relationship between one or more predictor variables and a response variable, but there are some key differences between the two models:

logistic regression vs. linear regression

Here’s a summary of the differences:

Difference #1: Type of Response Variable

A linear regression model is used when the response variable takes on a continuous value such as:

  • Price
  • Height
  • Age
  • Distance

Conversely, a logistic regression model is used when the response variable takes on a categorical value such as:

  • Yes or No
  • Male or Female
  • Win or Not Win

Difference #2: Equation Used

Linear regression uses the following equation to summarize the relationship between the predictor variable(s) and the response variable:

Y = β0 + β1X1 + β2X2 + … + βpXp

where:

  • Y: The response variable
  • Xj: The jth predictor variable
  • βj: The average effect on Y of a one unit increase in Xj, holding all other predictors fixed

Conversely, logistic regression uses the following equation:

p(X) = eβ0 + β1X1 + β2X2 + … + βpXp / (1 + eβ0 + β1X1 + β2X2 + … + βpXp)

This equation is used to predict the probability that an individual observation falls into a certain category.

Difference #3: Method Used to Fit Equation

Linear regression uses a method known as ordinary least squares to find the best fitting regression equation.

Conversely, logistic regression uses a method known as maximum likelihood estimation to find the best fitting regression equation.

Difference #4: Output to Predict

Linear regression predicts a continuous value as the output. For example:

  • Price ($150, $199, $400, etc.)
  • Height (14 inches, 2 feet, 94.32 centimeters, etc.)
  • Age (2 months, 6 years, 41.5 years, etc.)
  • Distance (1.23 miles, 4.5 kilometers, etc.)

Conversely, logistic regression predicts probabilities as the output. For example:

  • 40.3% chance of getting accepted to a university.
  • 93.2% chance of winning a game.
  • 34.2% chance of a law getting passed.

When to Use Logistic vs. Linear Regression

The following practice problems can help you gain a better understanding of when to use logistic regression or linear regression.

Problem #1: Annual Income

Suppose an economist wants to use predictor variables (1) weekly hours worked and (2) years of education to predict the annual income of individuals.

In this scenario, he would use linear regression because the response variable (annual income) is continuous.

Problem #2: University Acceptance

Suppose a college admissions officer wants to use the predictor variables (1) GPA and (2) ACT score to predict the probability that a student will get accepted into a certain university.

In this scenario, she would use logistic regression because the response variable is categorial and can only take on two values – accepted or not accepted.

Problem #3: Home Price

Suppose a real estate agent wants to use the predictor variables (1) square footage, (2) number of bedrooms, and (3) number of bathrooms to predict the selling house of prices.

In this scenario, she would use linear regression because the response variable (price) is continuous.

Problem #4: Spam Detection

Suppose a computer programmer wants to use the predictor variables (1) number of words and (2) country of origin to predict the probability that a given email is spam.

In this scenario, he would use logistic regression because the response variable is categorical and can only take on two values – spam or not spam.

Additional Resources

The following tutorials offer more details on linear regression:

The following tutorials offer more details on logistic regression:

Leave a Reply

Your email address will not be published.