A Guide to Permutation Tests

A guide to permutation tests

A permutation test (sometimes called a randomization test) is a way to test for a difference in outcomes between different treatment groups.

This tutorial explains how to perform a permutation test using an example in R. 

A Permutation Test Example

Researchers want to know whether training program A or training program B helps basketball players put on more weight over the course of one month. To test for differences between the training programs, researchers recruit 100 players and randomly assign 50 to use program A and 50 to use program B for one month.

#set seed to make this example replicable
set.seed(0)

#create data frame that shows training program and resulting increase in weight
#for 100 players
data <- data.frame(treatment = rep(c('A', 'B'), each = 50))
data$weight_gain <- ifelse(data$treatment == 'A', rnorm(50, 10, 2), rnorm(50, 12, 2))

#view first 6 rows and last 6 rows of data frame
head(data); tail(data)

#  treatment weight_gain
#1         A   12.525909
#2         A    9.347533
#3         A   12.659599
#4         A   12.544859
#5         A   10.829283
#6         A    6.920100

#    treatment weight_gain
#95          B    13.19252
#96          B    12.23944
#97          B    11.43565
#98          B    14.91198
#99          B    12.45804
#100         B    13.99309

Next, we can find the difference in mean weight gain between the two programs:

#find mean weight gain for players on program A
mean_A <- mean(data$weight_gain[data$treatment == 'A'])

#find mean weight gain for players on program B
mean_B <- mean(data$weight_gain[data$treatment == 'B'])

#find difference between mean weight gains of the programs
mean_B - mean_A

#[1] 1.99495

We can see that the players on program B gained an average of 1.99 more pounds than the players on program A.

Now, the idea behind the permutation test is that we can “shuffle” or permute the observed data by assigning different outcome values to each observation from among the set of actual observed outcomes.

When we permute the outcome values during the test, we are able to see all of the possible alternative treatment assignments we could have had. This lets us see where the mean difference in our observed data falls relative to all of the mean differences we could have seen.

A permutation test requires that we see all possible permutations of the data, but this can be a huge number of permutations if the data is even somewhat large. Thus, we can perform an approximate permutation test by simply performing a large number of resamples.  This should allow us to approximate the actual permutation distribution.

To obtain a single permutation of the data, we can sample without replacement from the observed data and calculate the mean difference between the two programs once again:

#sample from the original data treatment values without replacement
samp <- sample(data$treatment, length(data$treatment), replace = FALSE)

#create a new data frame with the original treatments ('A' and 'B') all mixed up
data_permutation <- data
data_permutation$treatment <- samp

#view the first 6 and last 6 rows of this new permuted data frame
head(data_permutation); tail(data_permutation)

#  treatment weight_gain
#1         B   12.525909
#2         A    9.347533
#3         A   12.659599
#4         B   12.544859
#5         A   10.829283
#6         A    6.920100

#    treatment weight_gain
#95          A    13.19252
#96          B    12.23944
#97          A    11.43565
#98          A    14.91198
#99          A    12.45804
#100         A    13.99309

Notice how all of the weight_gain values stayed the same, but the treatments are now mixed up. This new data frame represents one permuted version of our original data frame.

Now we can find the difference between the two training programs based on this permuted data:

#find mean weight gain for players on program A
mean_A <- mean(data_permutation$weight_gain[data_permutation$treatment == 'A'])

#find mean weight gain for players on program B
mean_B <- mean(data_permutation$weight_gain[data_permutation$treatment == 'B'])

#find difference between mean weight gains of the programs
mean_B - mean_A

#[1] -0.4353178

If we repeat this process many times, we can create our approximate permutation distribution, which represents the sampling distribution for the mean difference. We can use a simple for loop in R to run this permutation process 3,000 times and output the mean differences to a vector:

#create an empty vector
diff_vector <- numeric(0)

#create 3,000 permutations of the original data frame, find the mean difference
#between the two programs in each permutation, then output this mean difference
#to a vector
for(i in 1:3000) {
samp <- sample(data$treatment, length(data$treatment), replace = FALSE)

data_permutation <- data
data_permutation$treatment <- samp

mean_A <- mean(data_permutation$weight_gain[data_permutation$treatment == 'A'])
mean_B <- mean(data_permutation$weight_gain[data_permutation$treatment == 'B'])

#find difference between mean weight gains of the programs
diff_vector[i] <- mean_B - mean_A

}

#view first 6 mean differences
head(diff_vector)

#[1] -0.4353178 0.2503843 -0.5839568 -0.1733017 0.1372405 0.6878647

Next, we can create a histogram to view this distribution of mean differences and draw a vertical line at our actual observed mean difference of 1.99495.

hist(diff_vector, xlim=c(-2, 2), col = "steelblue", breaks = 100)
abline(v = 1.99495, col = "black", lwd = 2)

Permutation test distribution of mean differences

Our observed mean difference in weight gain appears pretty extreme in terms of the distribution of possible mean differences if the weight gain was indeed independent of the treatment.

We can use this distribution to obtain a p-value for our observed mean difference by counting how many permuted mean differences are larger than the one we observed in our actual data, then dividing by the number of permuted distributions, which is 3,000 in this case: 

#one-tailed test
sum(diff_vector > 1.99495)/3000

#[1] 0
#two-tailed test
sum(abs(diff_vector) > 1.99495)/3000

[1] 0

Whether we conduct a one-tailed or two-tailed test, our p-value is zero either way. This indicates that our difference is unlikely to be due to random chance if indeed the weight gain was independent of treatment.

Permutation Test Using the coin Library

Instead of conducting a permutation test by writing our own code, we could simply use the coin library in R.

#load coin library
library(coin)

#one-tailed test
independence_test(data$weight_gain ~ data$treatment, alternative = "less")

#	Asymptotic General Independence Test
#
#data:  data$weight_gain by data$treatment (A, B)
#Z = -4.9134, p-value = 4.475e-07
#alternative hypothesis: less

#two-tailed test
independence_test(data$weight_gain ~ data$treatment) 

#	Asymptotic General Independence Test
#
#data:  data$weight_gain by data$treatment (A, B)
#Z = -4.9134, p-value = 8.949e-07
#alternative hypothesis: two.sided

From the results of the test we can see that our approximate permutation distribution provided the same inference and a nearly identical p-value as the coin library.

Leave a Reply

Your email address will not be published. Required fields are marked *