To select a random sample in R we can use the **sample() **function, which uses the following syntax:

**sample(x, size, replace = FALSE, prob = NULL)**

where:

**x:**A vector of elements from which to choose.**size:**Sample size.**replace:**Whether to sample with replacement or not. Default is FALSE.**prob:**Vector of probability weights for obtaining elements from vector. Default is NULL.

This tutorial explains how to use this function to select a random sample in R from both a vector and a data frame.

**Example 1: Random Sample from a Vector**

The following code shows how to select a random sample from a vector **without replacement**:

#create vector of data data <- c(1, 3, 5, 6, 7, 8, 10, 11, 12, 14) #select random sample of 5 elements without replacement sample(x=data, size=5) [1] 10 12 5 14 7

The following code shows how to select a random sample from a vector **with replacement**:

#create vector of data data <- c(1, 3, 5, 6, 7, 8, 10, 11, 12, 14) #select random sample of 5 elements with replacement sample(x=data, size=5, replace=TRUE) [1] 12 1 1 6 14

**Example 2: Random Sample from a Data Frame**

The following code shows how to select a random sample from a data frame:

#create data frame df <- data.frame(x=c(3, 5, 6, 6, 8, 12, 14), y=c(12, 6, 4, 23, 25, 8, 9), z=c(2, 7, 8, 8, 15, 17, 29)) #view data frame df x y z 1 3 12 2 2 5 6 7 3 6 4 8 4 6 23 8 5 8 25 15 6 12 8 17 7 14 9 29 #select random sample of three rows from data frame rand_df <- df[sample(nrow(df), size=3), ] #display randomly selected rows rand_df x y z 4 6 23 8 7 14 9 29 1 3 12 2

Here’s what’s happening in this bit of code:

**1.** To select a subset of a data frame in R, we use the following syntax: df[rows, columns]

**2.** In the code above, we randomly select a sample of 3 rows from the data frame and *all *columns.

**3.** The end result is a subset of the data frame with 3 randomly selected rows.

It’s important to note that each time we use the **sample()** function, R will select a different sample since the function chooses values randomly.

In order to replicate the results of some analysis, be sure to use **set.seed(some number)** so that the sample() function chooses the same random sample each time. For example:

#make this example reproducible set.seed(23) #create data frame df <- data.frame(x=c(3, 5, 6, 6, 8, 12, 14), y=c(12, 6, 4, 23, 25, 8, 9), z=c(2, 7, 8, 8, 15, 17, 29)) #select random sample of three rows from data frame rand_df <- df[sample(nrow(df), size=3), ] #display randomly selected rows rand_df x y z 5 8 25 15 2 5 6 7 6 12 8 17

Each time you run the above code, the same 3 rows of the data frame will be selected each time.

**Additional Resources**

Stratified Sampling in R (With Examples)

Systematic Sampling in R (With Examples)

Cluster Sampling in R (With Examples)

Hi, Many thanks for the great explanation. I have a question: Is there any functions or method by which I can I control the mean or SD of the sample that will be chosen, I would appreciate that.