This lesson explains how to conduct a chi-square test for independence.
When to Use a Chi-Square Test for Independence
We use a chi-square test for independence when we want to formally test whether or not there is a significant association between two categorical variables from a single population.
Checking Conditions
Before we can conduct a chi-square test for independence, we first need to make sure the following conditions are met to ensure that our test will be valid:
- Random: A random sample or random experiment should be used to collect the data for both samples.
- Categorical: The variables we are studying should be categorical.
- Size: The expected number of observations at each level of the variable should be at least 5.
If these conditions are met, we can then conduct the test. The following example show how to conduct a chi-square test for independence.
Example: Chi-Square Test for Independence
We want to know whether or not gender is associated with political party preference. We take a simple random sample of 500 voters and survey them on their political party preference. Here are the results:
Republican | Democrat | Independent | Total | |
---|---|---|---|---|
Male | 120 | 90 | 40 | 250 |
Female | 110 | 95 | 45 | 250 |
Total | 230 | 185 | 85 | 500 |
Does gender seem to be associated with political party preference? Use a 0.05 level of significance.
Step 1. State the hypotheses.
The null hypothesis (H0): Gender and political party preference is independent.
The alternative hypothesis: (Ha): Gender and political party preference is not independent.
Step 2. Determine a significance level to use.
The problem tells us that we are to use a .05 level of significance.
Step 3. Find the test statistic.
The test statistic is X2 = Σ [ (Oi – Ei)2 / Ei ]
Where Σ is just a fancy symbol that means “sum”, Oi is the observed frequency at level i of the variable, and Ei is the expected frequency at level i of the variable.
Notice that we surveyed an equal amount of males and females. This means that if there is no association between gender and political party preference, we can expect that each party is split 50/50 between males and females.
For example, we would expect that 50% of all the people who said they were republican to be females. That is, .50 * 230 = 115. We would also expect .50 * 230 = 115 males. Let’s find the expected number and observed number of people for each political party:
Republican | Democrat | Independent | Total | |
---|---|---|---|---|
Male | 115 | 92.5 | 42.5 | 250 |
Female | 115 | 92.5 | 42.5 | 250 |
Total | 230 | 185 | 85 | 500 |
Republican | Democrat | Independent | Total | |
---|---|---|---|---|
Male | 120 | 90 | 40 | 250 |
Female | 110 | 95 | 45 | 250 |
Total | 230 | 185 | 85 | 500 |
Lastly, calculate the Chi-Square test statistic X2: (120 – 115)2 / 115 + (110 – 115)2 / 115 + (90 – 92.5)2 / 92.5 + (95 – 92.5)2 / 92.5 + (40 – 42.5)2 / 42.5 + (45 – 42.5)2 / 42.5 = .864
Use the Chi-Square Calculator with a degrees of freedom = (r-1)*(c-1) (where r = # rows, c = # columns) = (2-1)*(3-1) = 2, Chi-square critical value = .864, and click “Calculate p-value” to find that the p-value = .35079. Then 1 – .35079= .649.
Step 4. Reject or fail to reject the null hypothesis.
Since the p-value (.649) is not less than our significance level of .05, we fail to reject the null hypothesis.
Step 5. Interpret the results.
Since we failed to reject the null hypothesis, we do not have sufficient evidence to state that there is an association between gender and political party preference.