How to Conduct a MANOVA in R

How to conduct a MANOVA in R

This tutorial explains how to conduct a MANOVA (multivariate analysis of variance) in R.

What is a MANOVA?

To understand the MANOVA, it first helps to understand the ANOVA.

An ANOVA (analysis of variance) is used to determine whether or not there is a statistically significant difference between the means of three or more independent groups. 

For example, suppose we want to know whether or not studying technique has an impact on exam scores for a class of students. We randomly split the class into three groups. Each group uses a different studying technique for one month to prepare for an exam. At the end of the month, all of the students take the same exam.

To find out if studying technique impacts exam scores, we can conduct a one-way ANOVA, which will tell us if if there is a statistically significant difference between the mean scores of the three groups.

One way ANOVA example

In an ANOVA, we have one response variable. However, in a MANOVA (multivariate analysis of variance) we have multiple response variables.

For example, suppose we want to know how level of education (i.e. high school, associates degree, bachelors degrees, masters degree, etc.) impacts both annual income and amount of student loan debt. In this case, we have one factor (level of education) and two response variables (annual income and student loan debt), so we could conduct a one-way MANOVA.

One-Way MANOVA Example

Related: Understanding the Differences Between ANOVA, ANCOVA, MANOVA, and MANCOVA

How to Conduct a MANOVA in R

In the following example, we’ll illustrate how to conduct a one-way MANOVA in R using the built-in dataset iris, which contains information about the length and width of different measurements of flowers for three different species (“setosa”, “virginica”, “versicolor”):

#view first six rows of iris dataset
head(iris)

#  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#1          5.1         3.5          1.4         0.2  setosa
#2          4.9         3.0          1.4         0.2  setosa
#3          4.7         3.2          1.3         0.2  setosa
#4          4.6         3.1          1.5         0.2  setosa
#5          5.0         3.6          1.4         0.2  setosa
#6          5.4         3.9          1.7         0.4  setosa

Suppose we want to know if species has any effect on sepal length and sepal width. Using species as the independent variable, and sepal length and sepal width as the response variables, we can conduct a one-way MANOVA using the manova() function in R.

The manova() function uses the following syntax:

manova(cbind(response_variable1, response_variable 2, …) ~ independent_variable, data = data)

In our example with the iris dataset, we can fit a MANOVA and view the results using the following syntax:

#fit the MANOVA model
model <- manova(cbind(Sepal.Length, Sepal.Width) ~ Species, data = iris)

#view the results
summary(model)
#           Df  Pillai approx F num Df den Df    Pr(>F)    
#Species     2 0.94531   65.878      4    294 < 2.2e-16 ***
#Residuals 147                                             
#---
#Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the output we can see that the F-statistic is 65.878 and the corresponding p-value is extremely small. This indicates that there is a statistically significant difference in sepal measurements based on species. 

Technical Note: By default, manova() uses the Pillai test statistic. Since the distribution of this test statistic is complex, an approximate F value is also provided for easier interpretation.

 

In addition, it’s possible to specify “Roy”, “Hotelling-Lawley”, or “Wilks” as the test statistic to be used by using the following syntax: summary(model, test = ‘Wilks’)

To find out exactly how both sepal length and sepal width are affected by species, we can perform univariate ANOVAs using summary.aov() as shown in the following code:

summary.aov(model)


# Response Sepal.Length :
#             Df Sum Sq Mean Sq F value    Pr(>F)    
#Species       2 63.212  31.606  119.26 < 2.2e-16 ***
#Residuals   147 38.956   0.265                      
#---
#Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Response Sepal.Width :
#             Df Sum Sq Mean Sq F value    Pr(>F)    
#Species       2 11.345  5.6725   49.16 < 2.2e-16 ***
#Residuals   147 16.962  0.1154                      
#---
#Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can see from the output that the p-values for both univariate ANOVAs are extremely small (<2.2e-16), which indicates that species has a statistically significant effect on both sepal width and sepal length.

Visualizing Group Means

It can also be helpful to visualize the group means for each level of our independent variable species to gain a better understanding of our results.

For example, we can use the gplots library and the plotmeans() function to visualize the mean sepal length by species:

#load gplots library
library(gplots)

#visualize mean sepal length by species
plotmeans(iris$Sepal.Length ~ iris$Species)

gplots plot_means() function in R

From the plot we can see that the mean sepal length varies quite a bit by species. This matches the results from our MANOVA, which told us that there was a statistically significant difference in sepal measurements based on species.

We can also visualize the mean sepal width by species:

plotmeans(iris$Sepal.Width ~ iris$Species)

MANOVA in R with plots

View the full RDocumentation for the manova() function here.

Leave a Reply

Your email address will not be published. Required fields are marked *