Dixon’s Q Test for Detecting Outliers (Explanation + Example)

Dixon's Q test

This tutorial provides a simple explanation of Dixon’s Q test for detecting outliers in a dataset along with a few examples of how to conduct the test.

What is Dixon’s Q Test?

Dixon’s Q Test, often referred to simply as the Q Test, is a statistical test that is used for detecting outliers in a dataset.

The test statistic for the Q test is as follows:

Q = |xa – xb| / R

where xa is the suspected outlier, xb is the data point closest to xa, and R is the range of the dataset. In most cases, xa is the maximum value in the dataset but it can also be the minimum value.

It’s important to note that the Q test is typically performed on small datasets and the test assumes that the data is normally distributed. It’s also important to note that the Q test should only be conducted one time for a given dataset.

How to Conduct Dixon’s Q Test By Hand

Suppose we have the following dataset: 

1, 3, 5, 7, 8, 9, 13, 25

We can follow the standard five-step procedure for hypothesis testing to conduct Dixon’s Q Test by hand to determine if the maximum value in this dataset is an outlier:

Step 1. State the hypotheses. 

The null hypothesis (H0): The max is not an outlier.

The alternative hypothesis: (Ha): The max is an outlier.

Step 2. Determine a significance level to use.

Common choices are 0.1, 0.05, and 0.01. We will use a .05 level of significance for this example.

Step 3. Find the test statistic.

Q = |xa – xb| / R

In this case, our max value is x= 25, our next closest value is x= 13, and our range is R = 25 – 1 = 24.

Thus,  = |25 – 13| / 24 = 0.5.

Next, we can compare this test statistic to the Q test critical values, which are shown below for various sample sizes (n) and confidence levels:

n       90%       95%       99%
  0.941    0.970    0.994
4    0.765    0.829    0.926
5    0.642    0.710    0.821
6    0.560    0.625    0.740
7    0.507    0.568    0.680
8    0.468    0.526    0.634
9    0.437    0.493    0.598
10 0.412    0.466    0.568
11 0.392    0.444    0.542
12 0.376    0.426    0.522
13 0.361    0.410    0.503
14 0.349    0.396    0.488
15 0.338    0.384    0.475
16 0.329    0.374    0.463
17 0.320    0.365    0.452
18 0.313    0.356    0.442
19 0.306    0.349    0.433
20 0.300    0.342    0.425
21 0.295    0.337    0.418
22 0.290    0.331    0.411
23 0.285    0.326    0.404
24 0.281    0.321    0.399
25 0.277    0.317    0.393
26 0.273    0.312    0.388
27 0.269    0.308    0.384
28 0.266    0.305    0.380
29 0.263    0.301    0.376
30 0.260    0.290    0.372

The critical value for a sample size of 8 and a confidence level of 95% is 0.526.

Step 4. Reject or fail to reject the null hypothesis.

Since our test statistic Q (0.5) is less than the critical value (0.526), we fail to reject the null hypothesis.

Step 5. Interpret the results. 

Since we failed to reject the null hypothesis, we conclude that the max value 25 is not an outlier in this dataset.

How to Conduct Dixon’s Q Test in R

To conduct Dixon’s Q Test on the same dataset in R, we can use the dixon.test() function from the outliers library, which uses the following syntax:

dixon.test(data, , type = 10, opposite = FALSE)

  • data: a numeric vector of data values
  • type: the type of formula to use to conduct the test statistic Q. Set to 10 to use the formula outlined earlier.
  • opposite: If FALSE, the test determines if the maximum value is an outlier. If TRUE, the test determines if the minimum value is an outlier. This is FALSE by default. 

NoteFind the complete documentation for dixon.test() here.

The following code illustrates how to conduct Dixon’s Q Test to determine if the maximum value in the dataset is an outlier.

#load the outliers library
library(outliers)

#create data
data <- c(1, 3, 5, 7, 8, 9, 13, 25)

#conduct Dixon's Q Test
dixon.test(data, type = 10)

#	Dixon test for outliers
#
#data:  data
#Q = 0.5, p-value = 0.06913
#alternative hypothesis: highest value 25 is an outlier

From the output we can see that the test statistic is Q = 0.5 and the corresponding p-value is 0.06913. Thus, we fail to reject the null hypothesis at a 0.05 significance level and conclude that 25 is not an outlier. This matches the result we got by hand.

Leave a Reply

Your email address will not be published. Required fields are marked *