A Chi-Square test of independence is used to determine whether or not there is a significant association between two categorical variables.
This test makes four assumptions:
Assumption 1: Both variables are categorical.
It’s assumed that both variables are categorical. That is, both variables take on values that are names or labels.
Examples of categorical variables include:
- Marital status (“married”, “single”, “divorced”)
- Political preference (“republican”, “democrat”, “independent”)
- Smoking status (“smoker”, “non-smoker”)
Assumption 2: All observations are independent.
It’s assumed that every observation in the dataset is independent. That is, the value of one observation in the dataset does not affect the value of any other observation.
Assumption 3: Cells in the contingency table are mutually exclusive.
It’s assumed that individuals can only belong to one cell in the contingency table. That is, cells in the table are mutually exclusive – an individual cannot belong to more than one cell.
Assumption 4: Expected value of cells should be 5 or greater in at least 80% of cells.
It’s assumed that the expected value of cells in the contingency table should be 5 or greater in at least 80% of cells and that no cell should have an expected value less than 1.
The following example shows how to check each of these four assumptions in practice.
Example: Checking the Assumptions of a Chi-Square Test
Suppose we want to know whether or not gender is associated with political party preference.
We take a simple random sample of 500 voters and survey them on their political party preference. The following table shows the results of the survey:
Republican | Democrat | Independent | Total | |
Male | 120 | 90 | 40 | 250 |
Female | 110 | 95 | 45 | 250 |
Total | 230 | 185 | 85 | 500 |
Before performing a Chi-Square test of independence, let’s verify that the four assumptions of the test are met.
Assumption 1: Both variables are categorical.
This assumption is easy to verify. We can see that the two variables in the contingency table are both categorical:
- Gender: This variable can only take on two categories – Male or Female.
- Political Party Preference: This variable can take on three categories – Republican, Democrat, or Independent.
Assumption 2: All observations are independent.
The only way to check this assumption is to verify that each individual included in this dataset was surveyed independently of every other individual.
If we used a random sampling method (like simple random sampling) then this assumption is likely met.
Assumption 3: Cells in the contingency table are mutually exclusive.
We can verify that this assumption is met by checking that no individual has been counted in more than one cell.
Assuming each individual in the dataset was only surveyed once, this assumption should be met because it’s not possible for an individual to be, say, a Male Republican and a Female Democrat simultaneously.
Assumption 4: Expected value of cells should be 5 or greater in at least 80% of cells.
We can use the following formula to calculate the expected values for each cell in the contingency table:
Expected value = (row sum * column sum) / table sum.
For example, the expected value for Male Republicans is: (230*250) / 500 = 115.
We can repeat this formula to obtain the expected value for each cell in the table:
Republican | Democrat | Independent | Total | |
Male | 115 | 92.5 | 42.5 | 250 |
Female | 115 | 92.5 | 42.5 | 250 |
Total | 230 | 185 | 85 | 500 |
We can see that no cell in the table has an expected value less than 5, so this assumption is met.
Once we’ve verified that the four assumptions are met, we can then use this calculator to perform a Chi-Square Test of Independence:
The p-value of the test is 0.649198. Since this p-value is not less than .05, we do not have sufficient evidence to say that there is an association between gender and political party preference.
Additional Resources
The following tutorials explain how to perform a Chi-Square Test of Independence in different statistical software:
How to Perform a Chi-Square Test of Independence in Excel
How to Perform a Chi-Square Test of Independence in R
How to Perform a Chi-Square Test of Independence in Python
How to Perform a Chi-Square Test of Independence in SPSS
Online Chi-Square Test of Independence Calculator
what shall i do if step 4 is not met?, is there any other test statistics that can help me instead of chi square
Hi, I hope you can help me. With regards to assumption 3. What if although no individual has been attributed to more than one cell, but have contributed more than one count to their cell? I’m calculating the number of times individuals perform a particular behaviour to find out which cell contributes the most to the overall count for that behaviour. I appreciate your help!