Understanding the Null Hypothesis for ANOVA Models

A one-way ANOVA is used to determine if there is a statistically significant difference between the mean of three or more independent groups.

A one-way ANOVA uses the following null and alternative hypotheses:

• H0: μ1 = μ2 = μ= … = μ(all of the group means are equal)
• HA: At least one group mean is different from the rest

To decide if we should reject or fail to reject the null hypothesis, we must refer to the p-value in the output of the ANOVA table.

If the p-value is less than some significance level (e.g. 0.05) then we can reject the null hypothesis and conclude that not all group means are equal.

A two-way ANOVA is used to determine whether or not there is a statistically significant difference between the means of three or more independent groups that have been split on two variables (sometimes called “factors”).

A two-way ANOVA tests three null hypotheses at the same time:

• All group means are equal at each level of the first variable
• All group means are equal at each level of the second variable
• There is no interaction effect between the two variables

To decide if we should reject or fail to reject each null hypothesis, we must refer to the p-values in the output of the two-way ANOVA table.

The following examples show how to decide to reject or fail to reject the null hypothesis in both a one-way ANOVA and two-way ANOVA.

Example 1: One-Way ANOVA

Suppose we want to know whether or not three different exam prep programs lead to different mean scores on a certain exam. To test this, we recruit 30 students to participate in a study and split them into three groups.

The students in each group are randomly assigned to use one of the three exam prep programs for the next three weeks to prepare for an exam. At the end of the three weeks, all of the students take the same exam.

The exam scores for each group are shown below:

When we enter these values into the One-Way ANOVA Calculator, we receive the following ANOVA table as the output:

Notice that the p-value is 0.11385.

For this particular example, we would use the following null and alternative hypotheses:

• H0: μ1 = μ2 = μ3 (the mean exam score for each group is equal)
• HA: At least one group mean is different from the rest

Since the p-value from the ANOVA table is not less than 0.05, we fail to reject the null hypothesis.

This means we don’t have sufficient evidence to say that there is a statistically significant difference between the mean exam scores of the three groups.

Example 2: Two-Way ANOVA

Suppose a botanist wants to know whether or not plant growth is influenced by sunlight exposure and watering frequency.

She plants 40 seeds and lets them grow for two months under different conditions for sunlight exposure and watering frequency. After two months, she records the height of each plant. The results are shown below:

In the table above, we see that there were five plants grown under each combination of conditions.

For example, there were five plants grown with daily watering and no sunlight and their heights after two months were 4.8 inches, 4.4 inches, 3.2 inches, 3.9 inches, and 4.4 inches:

She performs a two-way ANOVA in Excel and ends up with the following output:

We can see the following p-values in the output of the two-way ANOVA table:

• The p-value for watering frequency is 0.975975. This is not statistically significant at a significance level of 0.05.
• The p-value for sunlight exposure is 3.9E-8 (0.000000039). This is statistically significant at a significance level of 0.05.
• The p-value for the interaction between watering  frequency and sunlight exposure is 0.310898. This is not statistically significant at a significance level of 0.05.

These results indicate that sunlight exposure is the only factor that has a statistically significant effect on plant height.

And because there is no interaction effect, the effect of sunlight exposure is consistent across each level of watering frequency.

That is, whether a plant is watered daily or weekly has no impact on how sunlight exposure affects a plant.

2 Replies to “Understanding the Null Hypothesis for ANOVA Models”

1. Theo Preller says:

Hi, I’m a student at Stellenbosch University majoring in Conservation Ecology and Entomology and we are currently busy doing stats. I am still at a very entry level of stats understanding, so pages like these are of huge help. I wanted to ask, why is the sum of squares (treatment) for the one way ANOVA so high? I calculated it by hand and got a much lower number, could you please help point out if and where I went wrong?

As I understand it, SSB (treatment) is calculated by finding the mean of each group and the grand mean, and then calculating the sum of squares like this:
GM = 85.5
x1 = 83.4
x2 = 89.3
x3 = 84.7

SSB = (85.5 – 83.4)^2 + (85.5 – 89.3)^2 + (85.5 – 84.7)^2 = 18.65
DF = 2

I would appreciate any help, thank you so much!

1. James Carmichael says:

Hi Theo…Certainly! Here are the equations rewritten as they would be typed in Python:

### Sum of Squares Between Groups (SSB)

In a one-way ANOVA, the sum of squares between groups (SSB) measures the variation due to the interaction between the groups. It is calculated as follows:

1. **Calculate the group means**:
“`python
mean_group1 = 83.4
mean_group2 = 89.3
mean_group3 = 84.7
“`

2. **Calculate the grand mean**:
“`python
grand_mean = 85.5
“`

3. **Calculate the sum of squares between groups (SSB)**:
Assuming each group has `n` observations:
“`python
n = 10 # Number of observations in each group

ssb = n * ((mean_group1 – grand_mean)**2 +
(mean_group2 – grand_mean)**2 +
(mean_group3 – grand_mean)**2)
“`

### Example Calculation

For simplicity, let’s assume each group has 10 observations:
“`python
n = 10

ssb = n * ((83.4 – 85.5)**2 +
(89.3 – 85.5)**2 +
(84.7 – 85.5)**2)
“`

Now calculate each term:
“`python
term1 = (83.4 – 85.5)**2 # term1 = (-2.1)**2 = 4.41
term2 = (89.3 – 85.5)**2 # term2 = (3.8)**2 = 14.44
term3 = (84.7 – 85.5)**2 # term3 = (-0.8)**2 = 0.64
“`

Sum these squared differences:
“`python
sum_of_squared_diffs = term1 + term2 + term3 # sum_of_squared_diffs = 4.41 + 14.44 + 0.64 = 19.49
ssb = n * sum_of_squared_diffs # ssb = 10 * 19.49 = 194.9
“`

So, the sum of squares between groups (SSB) is 194.9, assuming each group has 10 observations.

### Degrees of Freedom (DF)

The degrees of freedom for SSB is calculated as:
“`python
df_between = k – 1
“`
where `k` is the number of groups.

For three groups:
“`python
k = 3
df_between = k – 1 # df_between = 3 – 1 = 2
“`

### Summary

– **SSB** should consider the number of observations in each group.
– **DF** is the number of groups minus one.

By ensuring you include the number of observations per group in your SSB calculation, you can get the correct SSB value.