In statistics, there are two different types of Chi-Square tests:
1. The Chi-Square Goodness of Fit Test – Used to determine whether or not a categorical variable follows a hypothesized distribution.
2. The Chi-Square Test of Independence – Used to determine whether or not there is a significant association between two categorical variables.
Often you may have to perform each of these tests using the R programming language.
This tutorial explains how to interpret the results of both tests using step-by-step examples.
Example 1: Interpret Chi-Square Goodness of Fit Test Results in R
Suppose a store owner believes that an equal number of customers come into his shop each day from Monday through Friday.
To test this hypothesis, a he records the number of customers that come into the shop in a given week and finds the following:
- Monday: 50 customers
- Tuesday: 60 customers
- Wednesday: 40 customers
- Thursday: 47 customers
- Friday: 53 customers
We can perform a Chi-Square goodness of fit test in R to determine if the data is consistent with the store owner’s claim.
To perform this test in R, we can use the chisq.test() function, which uses the following syntax:
chisq.test(x, p)
where:
- x: A numerical vector of observed frequencies.
- p: A numerical vector of expected proportions.
The following code shows how to perform this test in practice:
#create array of observed and expected frequencies observed <- c(50, 60, 40, 47, 53) expected <- c(.2, .2, .2, .2, .2) #perform Chi-Square Goodness of Fit Test chisq.test(x=observed, p=expected) Chi-squared test for given probabilities data: observed X-squared = 4.36, df = 4, p-value = 0.3595
Here is how to interpret the results of the test:
- The Chi-Square test statistic is 4.36.
- The corresponding p-value is 0.3595.
Since the p-value (.3595) is not less than 0.05, we fail to reject the null hypothesis.
In the context of this example, it means we do not have sufficient evidence to say that the true distribution of customers is different from the distribution that the shop owner claimed.
Example 2: Interpret Chi-Square Test of Independence Results in R
Suppose researchers want to know whether or not gender is associated with political party preference.
They take a simple random sample of 500 voters and survey them on their political party preference:
Republican | Democrat | Independent | Total | |
Male | 120 | 90 | 40 | 250 |
Female | 110 | 95 | 45 | 250 |
Total | 230 | 185 | 85 | 500 |
We can use the following syntax to perform a Chi-Square Test of Independence in R to determine if gender is associated with political party preference:
#create table to hold survey data data <- matrix(c(120, 90, 40, 110, 95, 45), ncol=3, byrow=TRUE) colnames(data) <- c("Rep","Dem","Ind") rownames(data) <- c("Male","Female") data <- as.table(data) #perform Chi-Square Test of Independence chisq.test(data) Pearson's Chi-squared test data: data X-squared = 0.86404, df = 2, p-value = 0.6492
Here is how to interpret the results of the test:
- Chi-Square Test Statistic: 0.86404
- The corresponding p-value: 0.6492
Since the p-value (0.6492) of the test is not less than 0.05, we fail to reject the null hypothesis.
In the context of this example, this means we do not have sufficient evidence to say that there is an association between gender and political party preference.
Additional Resources
The following tutorials explain how to perform other common tasks in R:
How to Perform a Chi-Square Test of Independence in R
How to Calculate the P-Value of a Chi-Square Statistic in R
How to Find the Chi-Square Critical Value in R