When to Use a Chi-Square Goodness of Fit Test
We use a chi-square goodness of fit test when we want to formally test whether or not a categorical variable follows a hypothesized distribution.
Before we can conduct a chi-square goodness of fit test, we first need to make sure the following conditions are met to ensure that our test will be valid:
- Random: A random sample or random experiment should be used to collect the data.
- Categorical: The variable we are studying should be categorical.
- Size: The expected number of observations at each level of the variable should be at least 5.
If these conditions are met, we can then conduct the test. The following example show how to conduct a chi-square goodness of fit test.
Example: Chi-Square Goodness of Fit Test
An owner of a shop claims that 30% of all his weekend customers visit on Friday, 50% on Saturday, and 20% on Sunday. An independent researcher visits the shop on a random weekend and finds that 91 customers visit on Friday, 104 visit on Saturday, and 65 visit on Sunday.
Is this data consistent with the shop owner’s claim? Use a 0.05 level of significance.
Step 1. State the hypotheses.
The null hypothesis (H0): The shop owner’s claim is correct: 30% of customers visit on Friday, 50% on Saturday, and 20% on Sunday.
The alternative hypothesis: (Ha): At least one of the proportions in the null hypothesis is not correct.
Step 2. Determine a significance level to use.
The problem tells us that we are to use a .05 level of significance.
Step 3. Find the test statistic.
The test statistic is X2 = Σ [ (Oi – Ei)2 / Ei ]
Where Σ is just a fancy symbol that means “sum”, Oi is the observed frequency at level i of the variable, and Ei is the expected frequency at level i of the variable.
There were 260 customers who visited the shop on this particular weekend (91 on Friday + 104 on Saturday + 65 on Sunday).
According to the shop owner, we should expect 30% * 260 = 78 of the total customers to visit on Friday. The observed number of people who visited on Friday was 91. So for Friday we have:
(O – E)2 / E = (91 – 78)2 / 78 = 2.167
According to the shop owner, we should expect 50% * 260 = 130 of the total customers to visit on Saturday. The observed number of people who visited on Saturday was 104. So for Saturday we have:
(O – E)2 / E = (104 – 130)2 / 130 = 5.2
According to the shop owner, we should expect 20% * 260 = 52 of the total customers to visit on Sunday. The observed number of people who visited on Sunday was 65. So for Sunday we have:
(O – E)2 / E = (65 – 52)2 / 52 = 3.25
To find the test statistic, we simply sum up these numbers: 2.167 + 5.2 + 3.25 = 10.617
Use the Chi-Square Calculator with a degrees of freedom = k-1 (k is the number of levels of the variable) = 3-1 = 2, Chi-square critical value = 10.617, and click “Calculate p-value” to find that the p-value = .99505. Then 1 – .99505 = .00495.
Step 4. Reject or fail to reject the null hypothesis.
Since the p-value (.00495) is less than our significance level of .05, we reject the null hypothesis.
Step 5. Interpret the results.
Since we rejected the null hypothesis, we have sufficient evidence to say the true distribution of customers who come in to this shop on weekends is not equal to 30% on Friday, 50% on Saturday, and 20% on Sunday.