How to Perform a Two-Way ANOVA in Python

A two-way ANOVA is used to determine whether or not there is a statistically significant difference between the means of three or more independent groups that have been split on two factors.

The purpose of a two-way ANOVA is to determine how two factors impact a response variable, and to determine whether or not there is an interaction between the two factors on the response variable.

This tutorial explains how to conduct a two-way ANOVA in Python.

Example: Two-Way ANOVA in Python

A botanist wants to know whether or not plant growth is influenced by sunlight exposure and watering frequency. She plants 30 seeds and lets them grow for two months under different conditions for sunlight exposure and watering frequency. After two months, she records the height of each plant, in inches.

Use the following steps to perform a two-way ANOVA to determine if watering frequency and sunlight exposure have a significant effect on plant growth, and to determine if there is any interaction effect between watering frequency and sunlight exposure.

Step 1: Enter the data.

First, we’ll create a pandas DataFrame that contains the following three variables:

  • water: how frequently each plant was watered: daily or weekly
  • sun: how much sunlight exposure each plant received: low, medium, or high
  • height: the height of each plant (in inches) after two months
import numpy as np
import pandas as pd

#create data
df = pd.DataFrame({'water': np.repeat(['daily', 'weekly'], 15),
                   'sun': np.tile(np.repeat(['low', 'med', 'high'], 5), 2),
                   'height': [6, 6, 6, 5, 6, 5, 5, 6, 4, 5,
                              6, 6, 7, 8, 7, 3, 4, 4, 4, 5,
                              4, 4, 4, 4, 4, 5, 6, 6, 7, 8]})

#view first ten rows of data 

	water	sun	height
0	daily	low	6
1	daily	low	6
2	daily	low	6
3	daily	low	5
4	daily	low	6
5	daily	med	5
6	daily	med	5
7	daily	med	6
8	daily	med	4
9	daily	med	5

Step 2: Perform the two-way ANOVA.

Next, we’ll perform the two-way ANOVA using the anova_lm() function from the statsmodels library:

import statsmodels.api as sm
from statsmodels.formula.api import ols

#perform two-way ANOVA
model = ols('height ~ C(water) + C(sun) + C(water):C(sun)', data=df).fit()
sm.stats.anova_lm(model, typ=2)

	           sum_sq	  df	      F	   PR(>F)
C(water)	 8.533333	 1.0	16.0000	 0.000527
C(sun)	        24.866667	 2.0	23.3125	 0.000002
C(water):C(sun)	 2.466667	 2.0	 2.3125	 0.120667
Residual	12.800000	24.0	    NaN	      NaN

Step 3: Interpret the results.

We can see the following p-values for each of the factors in the table:

  • water: p-value = .000527
  • sun: p-value = .0000002
  • water*sun: p-value = .120667

Since the p-values for water and sun are both less than .05, this means that both factors have a statistically significant effect on plant height.

And since the p-value for the interaction effect (.120667) is not less than .05, this tells us that there is no significant interaction effect between sunlight exposure and watering frequency.

Note: Although the ANOVA results tell us that watering frequency and sunlight exposure have a statistically significant effect on plant height, we would need to perform post-hoc tests to determine exactly how different levels of water and sunlight affect plant height.

Leave a Reply

Your email address will not be published. Required fields are marked *