In statistics, ANOVA (“analysis of variance”) models are used to determine whether or not the means of different treatment levels are equal.
An ANOVA has a balanced design if the sample sizes are equal across all treatment combinations.
Conversely, an ANOVA has an unbalanced design if the sample sizes are not equal across all treatment combinations.
For example, suppose we want to perform a one-way ANOVA to determine if three different fertilizers cause the same mean growth in plants.
The following graphic shows an example of a balanced and unbalanced design for this one-way ANOVA:
In the balanced design, there are an equal number of plants in each treatment. In the unbalanced design, there are unequal sample sizes.
Or suppose we want to perform a two-way ANOVA to determine if different combinations of fertilizer and sunlight cause the same mean growth in plants.
The following graphic shows and example of a balanced and unbalanced design for this two-way ANOVA:
Related: One-Way vs. Two-Way ANOVA: When to Use Each
Why is a Balanced Design Preferred?
Balanced designs offer the following advantages over unbalanced designs:
1. The power of an ANOVA is highest when sample sizes are equal across all treatment combinations. When the power is highest, we have the best chance of detecting differences among the means across treatment combinations when the means truly are different.
2. The overall F-statistic of the ANOVA is less sensitive to violations of the assumption of equal variance.
How do Unbalanced Designs Occur?
Even if researchers attempt to set up a balanced design for an ANOVA, there are several reasons why an unbalanced design could occur, including:
- Individuals may decide to opt out of a study halfway through
- Plants may simply die during the course of the study
- A manufacturing plant may shut down and not be able to deliver certain components needed for a study.
There are tons of reasons why an experiment may suddenly become an unbalanced design.
How to Handle Unbalanced Designs
As mentioned earlier, balanced designs are preferred because they offer higher statistical power and more reliable test statistics.
However, if you do have to perform an experiment using an unbalanced design, you have three choices:
1. Proceed with an ANOVA anyway.
If the sample sizes across treatment combinations are not equal, but the assumption of equal variances is met, you can still proceed to perform an ANOVA anyway.
It’s well-known that ANOVA’s are fairly robust to unequal sample sizes if the variances across each treatment combination are still equal.
2. Impute missing values.
If there are only slight differences among sample sizes between treatment combinations, you could impute missing values using the mean or median of the treatment levels.
However, this approach should be used with caution and should only be used when sample sizes are nearly equal to begin with.
3. Perform a non-parametric test.
If the sample sizes are not equal and the assumption of equal variances is violated, you could instead perform a non-parametric equivalent to an ANOVA such as the Kruskal-Wallis test.
This type of test is much more robust to unequal sample sizes and unequal variances across treatment combinations.