ANOVA vs. Regression: What’s the Difference?


Two commonly used models in statistics are ANOVA and regression models.

These two types of models share the following similarity:

  • The response variable in each model is continuous. Examples of continuous variables include weight, height, length, width, time, age, etc.

However, these two types of models share the following difference:

  • ANOVA models are used when the predictor variables are categorical. Examples of categorical variables include level of education, eye color, marital status, etc.
  • Regression models are used when the predictor variables are continuous.*

*Regression models can be used with categorical predictor variables, but we have to create dummy variables in order to use them.

The following examples show when to use ANOVA vs. regression models in practice.

Example 1: ANOVA Model Preferred

Suppose a biologist wants to understand whether or not four different fertilizers lead to the same average plant growth (in inches) during a one-month period. To test this, she applies each fertilizer to 20 plants and records the growth of each plant after one month.

In this scenario, the biologist should use a one-way ANOVA model to analyze the differences between the fertilizers because there is one predictor variable and it is categorical.

In other words, the values for the predictor variable can be classified into the following “categories”:

  • Fertilizer 1
  • Fertilizer 2
  • Fertilizer 3
  • Fertilizer 4

A one-way ANOVA will tell the biologist whether or not the mean plant growth is equal between the four different fertilizers.

Example 2: Regression Model Preferred

Suppose a real estate agent wants to understand the relationship between square footage and house price. To analyze this relationship, he collects data on square footage and house price for 200 houses in a particular city.

In this scenario, the real estate agent should use a simple linear regression model to analyze the relationship between these two variables because the predictor variable (square footage) is continuous.

Using simple linear regression, the real estate agent can fit the following regression model:

House price = β0 + β1(square footage)

The value for β1 will represent the average change in house price associated with each additional square foot.

This will allow the real estate agent to quantify the relationship between square footage and house price.

Example 3: Regression Model with Dummy Variables Preferred

Suppose a real estate agent wants to understand the relationship between the predictor variables “square footage” and “home type” (single-family, apartment, townhome) with the response variable of house price.

In this scenario, the real estate agent can use multiple linear regression by converting “home type” into a dummy variable since it’s currently a categorical variable.

The real estate agent can then fit the following multiple linear regression model:

House price = β0 + β1(square footage) + β2(single-family) + β3(apartment)

Here’s how we would interpret the coefficients in the model:

  • β1: The average change in house price associated with one extra square foot.
  • β2: The average difference in price between a single-family home and a townhome, assuming square footage is held constant.
  • β3: The average difference in price between a single-family home and an apartment, assuming square footage is held constant.

Check out the following tutorials to see how to create dummy variables in different statistical software:

Additional Resources

The following tutorials offer an in-depth introduction to ANOVA models:

The following tutorials offer an in-depth introduction to linear regression models:

Leave a Reply

Your email address will not be published.