Grubbs’ Test is a statistical test that can be used to identify the presence of one outlier in a dataset. To use this test, the dataset you’re analyzing should be approximately normally distributed and, ideally, should have at least 7 observations.
Note: If you suspect that there is more than one outlier in the dataset, then you should instead use the generalized extreme studentized deviate test for outliers.
If you suspect that the maximum value in the dataset is an outlier, the test statistic is calculated as:
G = (xmax – x) / s
If you suspect that the minimum value in the dataset is an outlier, the test statistic is calculated as:
G = (x – xmin) / s
And if you’re not sure if the maximum value or minimum value in the dataset is an outlier and you want to perform a two-sided test, then the test statistic is calculated as:
G = max|xi – x| / s
where x is the sample mean and s is the sample standard deviation.
The critical value for the test is calculated as:
Gcritical = (n-1)tcritical / √[n(n-2 + t2critical)]
where tcritical is the critical value of the t distribution with n-2 degrees of freedom and the significance level is α/n for a single-tail test and α/(2n) for a two-tailed test.
Example: Grubbs’ Test in Excel
Determine whether or not the value 60 is an outlier in the following dataset:
Step 1: First, we need to make sure that the data is approximately normally distributed. To do so, we can create a histogram to verify that the distribution roughly has a bell-shape. The following screenshot shows how to create a histogram in Excel using the Data Analysis ToolPak:
From the histogram, we can see that the data is roughly normally distributed. This means that we can go ahead and conduct Grubbs’ Test.
Step 2: Next, we’ll conduct Grubbs’ Test to determine if the value 60 is actually an outlier in the dataset. The screenshot below shows the formulas to use to conduct Grubbs’ Test:
The test statistic, G, in cell D4 is 3.603219.
The critical value, Gcritical, in cell D11 is 2.556581. Since the test statistic is greater than the critical value, this means that the value 60 is indeed an outlier in this dataset.
What to Do if an Outlier is Identified
If Grubbs’ Test does identify an outlier in your dataset, you have a few options:
- Double check to make sure that the value is not a typo or a data entry error. Occasionally, values that show up as outliers in datasets are simply typos made by an individual when entering the data. Go back and verify that the value was entered correctly before you make any further decisions.
- Assign a new value to the outlier. If the outlier turns out to be a result of a typo or data entry error, you may decide to assign a new value to it, such as the mean or the median of the dataset.
- Remove the outlier. If the value is a true outlier, you may choose to remove it if it will have a significant impact on your overall analysis.
No matter what you decide to do with the outlier, be sure to make a note of it when you present the final conclusions of your analysis.