Researchers often take samples from a population and use the data from the sample to draw conclusions about the population as a whole.

One commonly used sampling method is **systematic sampling**, which is implemented with a simple two step process:

**1.** Place each member of a population in some order.

**2.** Choose a random starting point and select every n^{th} member to be in the sample.

This tutorial explains how to perform systematic sampling in R.

**Example: Systematic Sampling in R**

Suppose a superintendent wants to obtain a sample of 100 students from a school that has 500 total students. She chooses to use systematic sampling in which she places each student in alphabetical order according to their last name, randomly chooses a starting point, and picks every 5th student to be in the sample.

The following code shows how to create a fake data frame to work with in R:

#make this example reproducible set.seed(1) #create simple function to generate random last names randomNames <- function(n = 5000) { do.call(paste0, replicate(5, sample(LETTERS, n, TRUE), FALSE)) } #create data frame df <- data.frame(last_name = randomNames(500), gpa = rnorm(500, mean=82, sd=3)) #view first six rows of data frame head(df) last_name gpa 1 GONBW 82.19580 2 JRRWZ 85.10598 3 ORJFW 88.78065 4 XRYNL 85.94409 5 FMDCE 79.38993 6 XZBJC 80.49061

And the following code shows how to obtain a sample of 100 students through systematic sampling:

#define function to obtain systematic sample obtain_sys = function(N,n){ k = ceiling(N/n) r = sample(1:k, 1) seq(r, r + k*(n-1), k) } #obtain systematic sample sys_sample_df = df[obtain_sys(nrow(df), 100), ] #view first six rows of data frame head(sys_sample_df) last_name gpa 3 ORJFW 88.78065 8 RWPSB 81.96988 13 RACZU 79.21433 18 ZOHKA 80.47246 23 QJETK 87.09991 28 JTHWB 83.87300 #view dimensions of data frame dim(sys_sample_df) [1] 100 2

Notice that the first member included in the sample was in row 3 of the original data frame. Each subsequent member in the sample is located 5 rows after the previous member.

And from using **dim() **we can see that the systematic sample we obtained is a data frame with 100 rows and 2 columns.

**Additional Resources**

Types of Sampling Methods

Stratified Sampling in R

Cluster Sampling in R