How to Select Random Samples in R (With Examples)


To select a random sample in R we can use the sample() function, which uses the following syntax:

sample(x, size, replace = FALSE, prob = NULL)

where:

  • x: A vector of elements from which to choose.
  • size: Sample size.
  • replace: Whether to sample with replacement or not. Default is FALSE.
  • prob: Vector of probability weights for obtaining elements from vector. Default is NULL.

This tutorial explains how to use this function to select a random sample in R from both a vector and a data frame.

Example 1: Random Sample from a Vector

The following code shows how to select a random sample from a vector without replacement:

#create vector of data
data <- c(1, 3, 5, 6, 7, 8, 10, 11, 12, 14)

#select random sample of 5 elements without replacement
sample(x=data, size=5)

[1] 10 12  5 14  7

The following code shows how to select a random sample from a vector with replacement:

#create vector of data
data <- c(1, 3, 5, 6, 7, 8, 10, 11, 12, 14)

#select random sample of 5 elements with replacement
sample(x=data, size=5, replace=TRUE)

[1] 12  1  1  6 14

Example 2: Random Sample from a Data Frame

The following code shows how to select a random sample from a data frame:

#create data frame
df <- data.frame(x=c(3, 5, 6, 6, 8, 12, 14),
                 y=c(12, 6, 4, 23, 25, 8, 9),
                 z=c(2, 7, 8, 8, 15, 17, 29))

#view data frame 
df

   x  y  z
1  3 12  2
2  5  6  7
3  6  4  8
4  6 23  8
5  8 25 15
6 12  8 17
7 14  9 29

#select random sample of three rows from data frame
rand_df <- df[sample(nrow(df), size=3), ]

#display randomly selected rows
rand_df

   x  y  z
4  6 23  8
7 14  9 29
1  3 12  2

Here’s what’s happening in this bit of code:

1. To select a subset of a data frame in R, we use the following syntax: df[rows, columns]

2. In the code above, we randomly select a sample of 3 rows from the data frame and all columns.

3. The end result is a subset of the data frame with 3 randomly selected rows.

It’s important to note that each time we use the sample() function, R will select a different sample since the function chooses values randomly.

In order to replicate the results of some analysis, be sure to use set.seed(some number) so that the sample() function chooses the same random sample each time. For example:

#make this example reproducible
set.seed(23)

#create data frame
df <- data.frame(x=c(3, 5, 6, 6, 8, 12, 14),
                 y=c(12, 6, 4, 23, 25, 8, 9),
                 z=c(2, 7, 8, 8, 15, 17, 29))

#select random sample of three rows from data frame
rand_df <- df[sample(nrow(df), size=3), ]

#display randomly selected rows
rand_df

   x  y  z
5  8 25 15
2  5  6  7
6 12  8 17

Each time you run the above code, the same 3 rows of the data frame will be selected each time. 

Additional Resources

Stratified Sampling in R (With Examples)
Systematic Sampling in R (With Examples)
Cluster Sampling in R (With Examples)

Leave a Reply

Your email address will not be published.