In statistics, the G-test of Goodness of Fit is used to determine whether or not some categorical variable follows a hypothesized distribution.
This test is an alternative to the Chi-Square Goodness of Fit test and is often used when outliers are present in the data or when the data you’re working with is extremely large.
The G-Test of Goodness of Fit uses the following null and alternative hypotheses:
- H0: A variable follows a hypothesized distribution.
- HA: A variable does not follow a hypothesized distribution.
The test statistic is calculated as follows:
G=2 * Σ[O * ln(O/E)]
- O: The observed count in a cell
- E: The expected count in a cell
If the p-value that corresponds to the test statistic is less than some significance level, then you can reject the null hypothesis and conclude that the variable under study does not follow the hypothesized distribution.
The following example shows how to perform a G-test of Goodness of Fit in practice.
Example: G-test of Goodness of Fit
A biologist claims that an equal proportion of three species of turtles exist in a certain area. To test this claim, an independent researcher counts the number of each type of species and finds the following:
- Species A: 80
- Species B: 125
- Species C: 95
The independent researcher can use the following steps to perform a G-test of Goodness of Fit to determine if the data she collected is consistent with the biologist’s claim.
Step 1: State the null and alternative hypotheses.
The researcher will perform the G-test of Goodness of Fit using the following hypotheses:
- H0: An equal proportion of three species of turtles exist in this area.
- HA: An equal proportion of three species of turtles does not exist in this area.
Step 2: Calculate the test statistic.
The formula to calculate the test statistic is as follows:
G=2 * Σ[O * ln(O/E)]
In this example, there are 300 total observed turtles. If there was an equal proportion of each species, we would expect to observe 100 turtles from each species. Thus, we can calculate the test statistic as:
G = 2 * [80*ln(80/100) + 125*ln(125/100) + 95*ln(95/100)] = 10.337
Step 3: Calculate the p-value of the test statistic.
According to the Chi-Square to P-Value Calculator, the p-value associated with a test statistic of 10.337 and #categories-1 = 3-1 = 2 degrees of freedom is 0.005693.
Since this p-value is less than .05 the researcher would reject the null hypothesis. This means she has sufficient evidence to say that an equal proportion of each species of turtle does not exist in this particular area.
Bonus: G-test of Goodness of Fit in R
You can use the Gtest() function from the DescTools package to quickly perform a G-test of Goodness of Fit in R.
The following code shows how to perform a G-test for the previous example:
#load the DescTools library library(DescTools) #perform the G-test GTest(x = c(80, 125, 95), #observed values p = c(1/3, 1/3, 1/3), #expected proportions correct = "none") Log likelihood ratio (G-test) goodness of fit test data: c(80, 125, 95) G = 10.337, X-squared df = 2, p-value = 0.005693
Notice that the G test statistic is 10.337 and the corresponding p-value is 0.005693. Since this p-value is less than .05, we would reject the null hypothesis.
This matches the results that we calculated by hand.
Feel free to use this G-test of Goodness of Fit Calculator to automatically perform a G-test for any dataset.