How to Perform a Repeated Measures ANOVA in Python


A repeated measures ANOVA is used to determine whether or not there is a statistically significant difference between the means of three or more groups in which the same subjects show up in each group.

This tutorial explains how to conduct a one-way repeated measures ANOVA in Python.

Example: Repeated Measures ANOVA in Python

Researchers want to know if four different drugs lead to different reaction times. To test this, they measure the reaction time of five patients on the four different drugs.

Since each patient is measured on each of the four drugs, we will use a repeated measures ANOVA to determine if the mean reaction time differs between drugs.

Use the following steps to perform the repeated measures ANOVA in Python.

Step 1: Enter the data.

First, we’ll create a pandas DataFrame to hold our data:

import numpy as np
import pandas as pd

#create data
df = pd.DataFrame({'patient': np.repeat([1, 2, 3, 4, 5], 4),
                   'drug': np.tile([1, 2, 3, 4], 5),
                   'response': [30, 28, 16, 34,
                                14, 18, 10, 22,
                                24, 20, 18, 30,
                                38, 34, 20, 44, 
                                26, 28, 14, 30]})

#view first ten rows of data 
df.head[:10]


	patient	drug	response
0	1	1	30
1	1	2	28
2	1	3	16
3	1	4	34
4	2	1	14
5	2	2	18
6	2	3	10
7	2	4	22
8	3	1	24
9	3	2	20	   

Step 2: Perform the repeated measures ANOVA.

Next, we will perform the repeated measures ANOVA using the AnovaRM() function from the statsmodels library:

from statsmodels.stats.anova import AnovaRM

#perform the repeated measures ANOVA
print(AnovaRM(data=df, depvar='response', subject='patient', within=['drug']).fit())

              Anova
==================================
     F Value Num DF  Den DF Pr > F
----------------------------------
drug 24.7589 3.0000 12.0000 0.0000
==================================

Step 3: Interpret the results.

A repeated measures ANOVA uses the following null and alternative hypotheses:

The null hypothesis (H0): µ1 = µ2 = µ3 (the population means are all equal)

The alternative hypothesis: (Ha): at least one population mean is different from the rest

In this example, the F test-statistic is 24.7589 and the corresponding p-value is 0.0000.

Since this p-value is less than 0.05, we reject the null hypothesis and conclude that there is a statistically significant difference in mean response times between the four drugs.

Step 4: Report the results.

Lastly, we will report the results of our repeated measures ANOVA. Here is an example of how to do so:

A one-way repeated measures ANOVA was conducted on 5 individuals to examine the effect that four different drugs had on response time.

 

Results showed that the type of drug used lead to statistically significant differences in response time (F(3, 12) = 24.75887, p < 0.001).

Additional Resources

The following tutorials provide additional information on repeated measures ANOVAs:

One-Way ANOVA vs. Repeated Measures ANOVA: The Difference
How to Perform a Repeated Measures ANOVA By Hand
The Three Assumptions of the Repeated Measures ANOVA

4 Replies to “How to Perform a Repeated Measures ANOVA in Python”

  1. Thanks for this. My question is, how can one test for sphericity in python 3.8? I know the function is available in Pingouin, but Pingouin doesn’t exist in 3.8 apparently…

  2. Good example! df.head[:10] doses not work, change it to df.head(10). After “type of drug used”, change lead to led.

  3. Hello, I was wondering which hypothesis we need to check and how to compute them ?

    Thank you!!

    1. Hi Justine…In ANOVA (Analysis of Variance), we are typically interested in comparing the means of multiple groups to determine if at least one group mean is significantly different from the others. Here are the key hypotheses and steps involved:

      ### Hypotheses in ANOVA

      1. **Null Hypothesis (H0)**: All group means are equal.
      \[
      H_0: \mu_1 = \mu_2 = \mu_3 = \ldots = \mu_k
      \]
      where \(\mu_i\) represents the mean of the \(i\)-th group.

      2. **Alternative Hypothesis (Ha)**: At least one group mean is different.
      \[
      H_a: \text{Not all group means are equal}
      \]

      ### Steps to Perform ANOVA

      1. **Calculate the Group Means**:
      For each group, compute the mean \(\bar{X}_i\).

      2. **Calculate the Overall Mean**:
      Compute the grand mean \(\bar{X}\), which is the mean of all observations across all groups.

      3. **Compute the Sum of Squares**:
      – **Total Sum of Squares (SST)**:
      \[
      SST = \sum_{i=1}^k \sum_{j=1}^{n_i} (X_{ij} – \bar{X})^2
      \]
      where \(X_{ij}\) is the \(j\)-th observation in the \(i\)-th group, \(k\) is the number of groups, and \(n_i\) is the number of observations in the \(i\)-th group.

      – **Between-Group Sum of Squares (SSB)**:
      \[
      SSB = \sum_{i=1}^k n_i (\bar{X}_i – \bar{X})^2
      \]

      – **Within-Group Sum of Squares (SSW)** (also called Error Sum of Squares):
      \[
      SSW = \sum_{i=1}^k \sum_{j=1}^{n_i} (X_{ij} – \bar{X}_i)^2
      \]

      4. **Compute the Degrees of Freedom**:
      – Between-group degrees of freedom: \(df_B = k – 1\)
      – Within-group degrees of freedom: \(df_W = N – k\)
      – Total degrees of freedom: \(df_T = N – 1\)
      where \(N\) is the total number of observations.

      5. **Calculate the Mean Squares**:
      – Mean square between groups (MSB):
      \[
      MSB = \frac{SSB}{df_B}
      \]

      – Mean square within groups (MSW):
      \[
      MSW = \frac{SSW}{df_W}
      \]

      6. **Compute the F-statistic**:
      \[
      F = \frac{MSB}{MSW}
      \]

      7. **Compare the F-statistic to the Critical Value**:
      – Determine the critical value from the F-distribution table using \(df_B\) and \(df_W\).
      – If \(F\) is greater than the critical value, reject the null hypothesis.

      8. **P-value Approach**:
      – Alternatively, compute the p-value associated with the calculated F-statistic.
      – If the p-value is less than the significance level (typically 0.05), reject the null hypothesis.

      ### Example Calculation

      Let’s consider an example with three groups and their means:

      1. Group 1: \( \bar{X}_1 = 10 \)
      2. Group 2: \( \bar{X}_2 = 20 \)
      3. Group 3: \( \bar{X}_3 = 30 \)

      Assume the overall mean \( \bar{X} = 20 \) and each group has 5 observations.

      1. **Calculate SSB**:
      \[
      SSB = 5 (10 – 20)^2 + 5 (20 – 20)^2 + 5 (30 – 20)^2 = 5(100) + 5(0) + 5(100) = 1000
      \]

      2. **Calculate SSW**:
      Let’s assume \(SSW\) is calculated to be 600.

      3. **Degrees of Freedom**:
      \[
      df_B = 3 – 1 = 2, \quad df_W = 15 – 3 = 12
      \]

      4. **Mean Squares**:
      \[
      MSB = \frac{1000}{2} = 500, \quad MSW = \frac{600}{12} = 50
      \]

      5. **F-statistic**:
      \[
      F = \frac{500}{50} = 10
      \]

      6. **Compare F-statistic**:
      Check the F-distribution table for \(df_B = 2\) and \(df_W = 12\). If the critical value is, say, 3.88, then since 10 > 3.88, we reject the null hypothesis.

      This process allows us to conclude whether there are significant differences among group means.

Leave a Reply

Your email address will not be published. Required fields are marked *