A Chi-Square Test of Independence is used to determine whether or not there is a significant association between two categorical variables.
This tutorial explains the following:
- The motivation for performing a Chi-Square Test of Independence.
- The formula to perform a Chi-Square Test of Independence.
- An example of how to perform a Chi-Square Test of Independence.
Chi-Square Test of Independence: Motivation
A Chi-Square test of independence can be used to determine if there is an association between two categorical variables in a many different settings. Here are a few examples:
- We want to know if gender is associated with political party preference so we survey 500 voters and record their gender and political party preference.
- We want to know if a person’s favorite color is associated with their favorite sport so we survey 100 people and ask them about their preferences for both.
- We want to know if education level and marital status are associated so we collect data about these two variables on a simple random sample of 50 people.
In each of these scenarios we want to know if two categorical variables are associated with each other. In each scenario, we can use a Chi-Square test of independence to determine if there is a statistically significant association between the variables.
Chi-Square Test of Independence: Formula
A Chi-Square test of independence uses the following null and alternative hypotheses:
- H_{0}: (null hypothesis) The two variables are independent.
- H_{1}: (alternative hypothesis) The two variables are not independent. (i.e. they are associated)
We use the following formula to calculate the Chi-Square test statistic X^{2}:
X^{2} = Σ(O-E)^{2} / E
where:
- Σ: is a fancy symbol that means “sum”
- O: observed value
- E: expected value
If the p-value that corresponds to the test statistic X^{2} with (#rows-1)*(#columns-1) degrees of freedom is less than your chosen significance level then you can reject the null hypothesis.
Chi-Square Test of Independence: Example
Suppose we want to know whether or not gender is associated with political party preference. We take a simple random sample of 500 voters and survey them on their political party preference. The following table shows the results of the survey:
Republican | Democrat | Independent | Total | |
Male | 120 | 90 | 40 | 250 |
Female | 110 | 95 | 45 | 250 |
Total | 230 | 185 | 85 | 500 |
Use the following steps to perform a Chi-Square test of independence to determine if gender is associated with political party preference.
Step 1: Define the hypotheses.
We will perform the Chi-Square test of independence using the following hypotheses:
- H_{0}: Gender and political party preference are independent.
- H_{1}: Gender and political party preference are not independent.
Step 2: Calculate the expected values.
Next, we will calculate the expected values for each cell in the contingency table using the following formula:
Expected value = (row sum * column sum) / table sum.
For example, the expected value for Male Republicans is: (230*250) / 500 = 115.
We can repeat this formula to obtain the expected value for each cell in the table:
Republican | Democrat | Independent | Total | |
Male | 115 | 92.5 | 42.5 | 250 |
Female | 115 | 92.5 | 42.5 | 250 |
Total | 230 | 185 | 85 | 500 |
Step 3: Calculate (O-E)^{2} / E for each cell in the table.
Next we will calculate (O-E)^{2} / E for each cell in the table where:
- O: observed value
- E: expected value
For example, Male Republicans would have a value of: (120-115)^{2} /115 = 0.2174.
We can repeat this formula for each cell in the table:
Republican | Democrat | Independent | |
Male | 0.2174 | 0.0676 | 0.1471 |
Female | 0.2174 | 0.0676 | 0.1471 |
Step 4: Calculate the test statistic X^{2} and the corresponding p-value.
X^{2 }= Σ(O-E)^{2} / E = 0.2174 + 0.2174 + 0.0676 + 0.0676 + 0.1471 + 0.1471 = 0.8642
According to the Chi-Square Score to P Value Calculator, the p-value associated with X^{2} = 0.8642 and (2-1)*(3-1) = 2 degrees of freedom is 0.649198.
Step 5: Draw a conclusion.
Since this p-value is not less than 0.05, we fail to reject the null hypothesis. This means we do not have sufficient evidence to say that there is an association between gender and political party preference.
Note: You can also perform this entire test by simply using the Chi-Square Test of Independence Calculator.
Additional Resources
The following tutorials explain how to perform a Chi-Square test of independence using different statistical programs:
How to Perform a Chi-Square Test of Independence in Stata
How to Perform a Chi-Square Test of Independence in Excel
How to Perform a Chi-Square Test of Independence in SPSS
How to Perform a Chi-Square Test of Independence in Python
How to Perform a Chi-Square Test of Independence in R
Chi-Square Test of Independence on a TI-84 Calculator
Chi-Square Test of Independence Calculator
what test do I use if there are 2 categorical variables and one categorical DV? as in I want to test political attitudes and beliefs in conspiracies and how they affect Covid conspiracy thinking