# Stratified Sampling in R (With Examples)

Researchers often take samples from a population and use the data from the sample to draw conclusions about the population as a whole.

One commonly used sampling method is stratified random sampling, in which a population is split into groups and a certain number of members from each group are randomly selected to be included in the sample.

This tutorial explains how to perform stratified random sampling in R.

## Example: Stratified Sampling in R

A high school is composed of 400 students who are either Freshman, Sophomores, Juniors, or Seniors. Suppose we’d like to take a stratified sample of 40 students such that 10 students from each grade are included in the sample.

The following code shows how to generate a sample data frame of 400 students:

```#make this example reproducible
set.seed(1)

#create data frame
df <- data.frame(grade = rep(c('Freshman', 'Sophomore', 'Junior', 'Senior'), each=100),
gpa = rnorm(400, mean=85, sd=3))

#view first six rows of data frame

1 Freshman 83.12064
2 Freshman 85.55093
3 Freshman 82.49311
4 Freshman 89.78584
5 Freshman 85.98852
6 Freshman 82.53859```

### Stratified Sampling Using Number of Rows

The following code shows how to use the group_by() and sample_n() functions from the dplyr package to obtain a stratified random sample of 40 total students with 10 students from each grade:

```library(dplyr)

#obtain stratified sample
strat_sample <- df %>%
sample_n(size=10)

#find frequency of students from each grade

Freshman    Junior    Senior Sophomore
10        10        10        10
```

### Stratified Sampling Using Fraction of Rows

The following code shows how to use the group_by() and sample_frac() functions from the dplyr package to obtain a stratified random sample in which we randomly select 15% of students from each grade:

```library(dplyr)

#obtain stratified sample
strat_sample <- df %>%
sample_frac(size=.15)

#find frequency of students from each grade

Freshman    Junior    Senior Sophomore
15        15        15        15 ```

## 2 Replies to “Stratified Sampling in R (With Examples)”

1. Cyrille says:

Great article there, Zach.
One of the few resources online I was able to find that defines and applies stratified probability sampling, with very little rambling. I was able to easily apply it to the problem I was working on. Thanks for the article.

2. Saed says:

Hi
I have three IDBs and this is the number of people registered from each

Site female Male Total
IDB_A 46 14 60
IDB_B 17 23 40
IDB_C 79 21 100
Total 142 58 200
And this is the sample I want to select from each site

Site female Male Total
IDB_A 20 6 26
IDB_B 7 10 17
IDB_C 34 9 43
Total 60 25 85
And I used the following code by creating three different strata (one for each site) and then selected a random sample from each stratum (NOTE: FBF_PDM is the name of the dataset)

str1 <- FBF_PDM[FBF_PDM\$Sites=="IDB_A",]
str2 <- FBF_PDM[FBF_PDM\$Sites=="IDB_B", ]
str3 <- FBF_PDM[FBF_PDM\$Sites=="IDB_C", ]

sample1 <- str1[sample(1:nrow(str1), 26, replace = FALSE), ]
sample2 <- str2[sample(1:nrow(str2), 17, replace = FALSE), ]
sample3 <- str3[sample(1:nrow(str3), 43, replace = FALSE), ]

overall <- rbind(sample1, sample2, sample3)

write.table(overall, "overall2.csv", row.names = FALSE, sep = " , ")
but this code does not give me exact sample I wanted to have, so is there a way I can have the sample I wanted to select from each site with gender (male, female).

Thanks