A Chi-Square Test of Independence is used to determine whether or not there is a significant association between two categorical variables.
This test uses the following null and alternative hypotheses:
- H0: (null hypothesis) The two variables are independent.
- H1: (alternative hypothesis) The two variables are not independent. (i.e. they are associated)
We use the following formula to calculate the Chi-Square test statistic X2 for this test:
X2 = Σ(Oi-Ei)2 / Ei
- Σ: is a fancy symbol that means “sum”
- O: observed value
- E: expected value
This test assumes that the discrete probabilities of the frequencies in a contingency table can be approximated by the Chi-Square distribution, which is a continuous distribution.
However, this assumption tends to be slightly incorrect and the resulting test statistic tends to be biased upwards.
To correct for this bias we can apply Yate’s continuity correction, which applies the following correction to the X2 formula:
X2 = Σ(|Oi-Ei| – 0.5)2 / Ei
We typically only use this correction when at least one cell in the contingency table has an expected frequency less than 5.
Example: Applying Yate’s Continuity Correction
Suppose we want to know whether or not gender is associated with political party preference. We take a simple random sample of 40 voters and survey them on their political party preference. The following table shows the results of the survey:
Here is how to perform a Chi-Square Test of Independence with Yate’s continuity correction:
Note: We calculate the expected value in each cell by multipling the row total by the column total, then dividing by the grand total. For example, the expected number of male republicans is (21*19)/40 = 9.975.
Chi-Square Test Statistic: X2 = Σ(|Oi-Ei| – 0.5)2 / Ei
- (|8-9.975| – 0.5)2 / 9.975 = .218
- (|9-6.3| – 0.5)2 / 6.3 = .768
- (|4-4.725| – 0.5)2 / 4.725 = .011
- (|11-9.025| – 0.5)2 / 9.025 = .241
- (|3-5.7| – 0.5)2 / 5.7 = .849
- (|5-4.275| – 0.5)2 / 4.275 = .012
Thus, X2 = .218 + .768 + .011 + .241 + .849 + .012 = 2.099
P-Value: According to the Chi-Square to P-Value Calculator, the p-value that corresponds to a Chi-Square test statistic with 2 degrees of freedom is 0.3501.
Since this p-value is not less than .05, we would fail to reject the null hypothesis. This means we do not have sufficient evidence to say that there is an association between gender and political party preference.