Often you may want to select a random sample of rows by group in R.

Fortunately this is easy to do by using the **sample_n()** function along with the **group_by()** function from the **dplyr** package in R, which is designed to perform this exact task.

The **sample_n****()** function uses the following basic syntax:

**sample_n(tbl, size, replace=FALSE, …)**

where:

**tbl**: The name of the data frame**size:**The number of rows to select**replace**: Whether to sample with replacement

Note that in most cases you will want to leave the value for the **replace** argument set to **FALSE** since you often don’t want to sample with replacement, i.e. allowing the same row to be included in the sample multiple times.

The following example shows how to use the **sample_n****()** function along with the **group_by()** function from the **dplyr** package to select a random sample of rows by group.

**Example: How to Sample by Group Using dplyr**

Suppose we create the following data frame that contains information about various basketball players:

#create data frame df <- data.frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'), points=c(99, 68, 86, 88, 95, 74, 78, 93), assists=c(22, 28, 45, 35, 34, 45, 28, 31), rebounds=c(30, 28, 24, 24, 30, 36, 30, 29)) #view data frame df team points assists rebounds 1 A 99 22 30 2 A 68 28 28 3 A 86 45 24 4 A 88 35 24 5 B 95 34 30 6 B 74 45 36 7 B 78 28 30 8 B 93 31 29

Notice that there are two unique teams in this data frame: **A** and **B**.

Suppose that we would like to select three random basketball players from each of these teams.

We can use the following syntax to do so:

library(dplyr) #select three random players from each team df %>% group_by(team) %>% sample_n(size=3) # A tibble: 6 x 4 # Groups: team [2] team points assists rebounds 1 A 86 45 24 2 A 99 22 30 3 A 88 35 24 4 B 78 28 30 5 B 93 31 29 6 B 95 34 30

This returns three random players from each team.

Note that we can specify a different value for the **size** argument of the **sample_n()** function to instead return a different number of players per team.

For example, we can use the following syntax to return two random players from each team instead:

library(dplyr) #select two random players from each team df %>% group_by(team) %>% sample_n(size=2) # A tibble: 4 x 4 # Groups: team [2] team points assists rebounds 1 A 88 35 24 2 A 99 22 30 3 B 78 28 30 4 B 93 31 29

This returns two random players from each team, just as we specified.

Note that each time we run this code the rows that are selected for each group have a chance at being different since the **sample_n()** function selects rows randomly.

If you would like to make the code reproducible, you can use the **set.seed()** function to set a random “seed” that will allow us to select the same random rows each time.

For example, we could use the following code to do so:

#make this example reproducible set.seed(1) library(dplyr) #select two random players from each team df %>% group_by(team) %>% sample_n(size=2)

Now each time we run this code, the same random sample of rows will be selected.

**Note**: You can find the complete documentation for the **sample_n()** function from the **dplyr** package here.

**Additional Resources**

The following tutorials explain how to perform other common tasks in R:

How to Rename Columns Using dplyr

How to Add Row to Data Frame Using dplyr

How to Use the pull() Function in dplyr