Systematic Sampling in R (With Examples)


Researchers often take samples from a population and use the data from the sample to draw conclusions about the population as a whole.

One commonly used sampling method is systematic sampling, which is implemented with a simple two step process:

1. Place each member of a population in some order.

2. Choose a random starting point and select every nth member to be in the sample.

This tutorial explains how to perform systematic sampling in R.

Example: Systematic Sampling in R

Suppose a superintendent wants to obtain a sample of 100 students from a school that has 500 total students. She chooses to use systematic sampling in which she places each student in alphabetical order according to their last name, randomly chooses a starting point, and picks every 5th student to be in the sample.

The following code shows how to create a fake data frame to work with in R:

#make this example reproducible
set.seed(1)

#create simple function to generate random last names
randomNames <- function(n = 5000) {
  do.call(paste0, replicate(5, sample(LETTERS, n, TRUE), FALSE))
}

#create data frame
df <- data.frame(last_name = randomNames(500),
                 gpa = rnorm(500, mean=82, sd=3))

#view first six rows of data frame
head(df)

  last_name      gpa
1     GONBW 82.19580
2     JRRWZ 85.10598
3     ORJFW 88.78065
4     XRYNL 85.94409
5     FMDCE 79.38993
6     XZBJC 80.49061

And the following code shows how to obtain a sample of 100 students through systematic sampling:

#define function to obtain systematic sample
obtain_sys = function(N,n){
  k = ceiling(N/n)
  r = sample(1:k, 1)
  seq(r, r + k*(n-1), k)
}

#obtain systematic sample
sys_sample_df = df[obtain_sys(nrow(df), 100), ]

#view first six rows of data frame
head(sys_sample_df)

   last_name      gpa
3      ORJFW 88.78065
8      RWPSB 81.96988
13     RACZU 79.21433
18     ZOHKA 80.47246
23     QJETK 87.09991
28     JTHWB 83.87300

#view dimensions of data frame
dim(sys_sample_df)

[1] 100   2

Notice that the first member included in the sample was in row 3 of the original data frame. Each subsequent member in the sample is located 5 rows after the previous member.

And from using dim() we can see that the systematic sample we obtained is a data frame with 100 rows and 2 columns.

Additional Resources

Types of Sampling Methods
Stratified Sampling in R
Cluster Sampling in R

Leave a Reply

Your email address will not be published.