A **permutation test** (sometimes called a randomization test) is a way to test for a difference in outcomes between different treatment groups.

This tutorial explains how to perform a permutation test using an example in R.

**A Permutation Test Example**

Researchers want to know whether training program A or training program B helps basketball players put on more weight over the course of one month. To test for differences between the training programs, researchers recruit 100 players and randomly assign 50 to use program A and 50 to use program B for one month.

#set seed to make this example replicable set.seed(0) #create data frame that shows training program and resulting increase in weight #for 100 players data <- data.frame(treatment = rep(c('A', 'B'), each = 50)) data$weight_gain <- ifelse(data$treatment == 'A', rnorm(50, 10, 2), rnorm(50, 12, 2)) #view first 6 rows and last 6 rows of data frame head(data); tail(data) # treatment weight_gain #1 A 12.525909 #2 A 9.347533 #3 A 12.659599 #4 A 12.544859 #5 A 10.829283 #6 A 6.920100 # treatment weight_gain #95 B 13.19252 #96 B 12.23944 #97 B 11.43565 #98 B 14.91198 #99 B 12.45804 #100 B 13.99309

Next, we can find the difference in mean weight gain between the two programs:

#find mean weight gain for players on program A mean_A <- mean(data$weight_gain[data$treatment == 'A']) #find mean weight gain for players on program B mean_B <- mean(data$weight_gain[data$treatment == 'B']) #find difference between mean weight gains of the programs mean_B - mean_A #[1] 1.99495

We can see that the players on program B gained an average of 1.99 more pounds than the players on program A.

Now, the idea behind the permutation test is that we can “shuffle” or **permute** the observed data by assigning different outcome values to each observation from among the set of actual observed outcomes.

When we permute the outcome values during the test, we are able to see all of the possible alternative treatment assignments we could have had. This lets us see where the mean difference in our observed data falls relative to all of the mean differences we could have seen.

A permutation test requires that we see all possible permutations of the data, but this can be a huge number of permutations if the data is even somewhat large.

Thus, we can perform an approximate permutation test by simply performing a large number of resamples. This should allow us to approximate the actual permutation distribution.

To obtain a single permutation of the data, we can sample without replacement from the observed data and calculate the mean difference between the two programs once again:

#sample from the original data treatment values without replacement samp <- sample(data$treatment, length(data$treatment), replace = FALSE) #create a new data frame with the original treatments ('A' and 'B') all mixed up data_permutation <- data data_permutation$treatment <- samp #view the first 6 and last 6 rows of this new permuted data frame head(data_permutation); tail(data_permutation) # treatment weight_gain #1 B 12.525909 #2 A 9.347533 #3 A 12.659599 #4 B 12.544859 #5 A 10.829283 #6 A 6.920100 # treatment weight_gain #95 A 13.19252 #96 B 12.23944 #97 A 11.43565 #98 A 14.91198 #99 A 12.45804 #100 A 13.99309

Notice how all of the weight_gain values stayed the same, but the treatments are now mixed up. This new data frame represents one permuted version of our original data frame.

Now we can find the difference between the two training programs based on this permuted data:

#find mean weight gain for players on program A mean_A <- mean(data_permutation$weight_gain[data_permutation$treatment == 'A']) #find mean weight gain for players on program B mean_B <- mean(data_permutation$weight_gain[data_permutation$treatment == 'B']) #find difference between mean weight gains of the programs mean_B - mean_A #[1] -0.4353178

If we repeat this process many times, we can create our approximate permutation distribution, which represents the sampling distribution for the mean difference. We can use a simple for loop in R to run this permutation process 3,000 times and output the mean differences to a vector:

#create an empty vector diff_vector <- numeric(0) #create 3,000 permutations of the original data frame, find the mean difference #between the two programs in each permutation, then output this mean difference #to a vector for(i in 1:3000) { samp <- sample(data$treatment, length(data$treatment), replace = FALSE) data_permutation <- data data_permutation$treatment <- samp mean_A <- mean(data_permutation$weight_gain[data_permutation$treatment == 'A']) mean_B <- mean(data_permutation$weight_gain[data_permutation$treatment == 'B']) #find difference between mean weight gains of the programs diff_vector[i] <- mean_B - mean_A } #view first 6 mean differences head(diff_vector) #[1] -0.4353178 0.2503843 -0.5839568 -0.1733017 0.1372405 0.6878647

Next, we can create a histogram to view this distribution of mean differences and draw a vertical line at our actual observed mean difference of 1.99495.

hist(diff_vector, xlim=c(-2, 2), col = "steelblue", breaks = 100) abline(v = 1.99495, col = "black", lwd = 2)

Our observed mean difference in weight gain appears pretty extreme in terms of the distribution of possible mean differences if the weight gain was indeed independent of the treatment.

We can use this distribution to obtain a p-value for our observed mean difference by counting how many permuted mean differences are larger than the one we observed in our actual data, then dividing by the number of permuted distributions, which is 3,000 in this case:

#one-tailed test sum(diff_vector > 1.99495)/3000 #[1] 0 #two-tailed test sum(abs(diff_vector) > 1.99495)/3000 [1] 0

Whether we conduct a one-tailed or two-tailed test, our p-value is zero either way. This indicates that our difference is unlikely to be due to random chance if indeed the weight gain was independent of treatment.

**Permutation Test Using the coin Library**

Instead of conducting a permutation test by writing our own code, we could simply use the **coin** library in R.

#loadcoinlibrary library(coin) #one-tailed test independence_test(data$weight_gain ~ data$treatment, alternative = "less") # Asymptotic General Independence Test # #data: data$weight_gain by data$treatment (A, B) #Z = -4.9134, p-value = 4.475e-07 #alternative hypothesis: less #two-tailed test independence_test(data$weight_gain ~ data$treatment) # Asymptotic General Independence Test # #data: data$weight_gain by data$treatment (A, B) #Z = -4.9134, p-value = 8.949e-07 #alternative hypothesis: two.sided

From the results of the test we can see that our approximate permutation distribution provided the same inference and a nearly identical p-value as the **coin **library.