Kolmogorov-Smirnov Test in R (With Examples)

The Kolmogorov-Smirnov test is used to test whether or not a sample comes from a certain distribution.

To perform a one-sample or two-sample Kolmogorov-Smirnov test in R we can use the ks.test() function.

This tutorial shows example of how to use this function in practice.

Example 1: One Sample Kolmogorov-Smirnov Test

Suppose we have the following sample data:

```#make this example reproducible
seed(0)

#generate dataset of 100 values that follow a Poisson distribution with mean=5
data <- rpois(n=20, lambda=5)```

The following code shows how to perform a Kolmogorov-Smirnov test on this sample of 100 data values to determine if it came from a normal distribution:

```#perform Kolmogorov-Smirnov test
ks.test(data, "pnorm")

One-sample Kolmogorov-Smirnov test

data:  data
D = 0.97725, p-value < 2.2e-16
alternative hypothesis: two-sided
```

From the output we can see that the test statistic is 0.97725 and the corresponding p-value is 2.2e-16.

Since the p-value is less than .05, we reject the null hypothesis. We have sufficient evidence to say that the sample data does not come from a normal distribution.

This result shouldn’t be surprising since we generated the sample data using the rpois() function, which generates random values that follow a Poisson distribution.

Example 2: Two Sample Kolmogorov-Smirnov Test

Suppose we have the following two sample datasets:

```#make this example reproducible
seed(0)

#generate two datasets
data1 <- rpois(n=20, lambda=5)
data2 <- rnorm(100)```

The following code shows how to perform a Kolmogorov-Smirnov test on these two samples to determine if they came from the same distribution:

```#perform Kolmogorov-Smirnov test
ks.test(data1, data2)

Two-sample Kolmogorov-Smirnov test

data:  data1 and data2
D = 0.99, p-value = 1.299e-14
alternative hypothesis: two-sided
```

From the output we can see that the test statistic is 0.99 and the corresponding p-value is 1.299e-14. Since the p-value is less than .05, we reject the null hypothesis.

We have sufficient evidence to say that the two sample datasets do not come from the same distribution.

This result also shouldn’t be surprising since we generated values for the first sample using the Poisson distribution and values for the second sample using the normal distribution.

The following tutorials explain how to perform other common tasks in R:

2 Replies to “Kolmogorov-Smirnov Test in R (With Examples)”

1. Lemon says:

I have two comments on function ks.test() in R for one-sample Kolmogorov-Smirnov test. First, in ks.test(data, “pnorm”) default parameter values are used, try ks.test(data, “pnorm”, 0, 1). The outputs will be the same. But you can try ks.test(data, “pnorm”, 5, sqrt(5)), where parameters are specified according to data simulated from Poisson distribution mean and standard deviation, and you will get totally different results. The sample size 20 is too small to reject the normality assumption.
Second, parameters in ks.test() must be pre-specified and not estimated from data as it is indicated in help page for ks.test(). For KS normality test with parameters estimated from data nortest::lillie.test(data) is implemented.

2. houMoon says:

Thank you so much for your answer. There is a question that can you tell me what is the “test statistic” . Is it refers to the max of
|CDF(sample1) – CDF(sample2)|.
My English is poor. Thanks for your reading.