How to Select Unique Rows in a Data Frame in R


You can use the following methods to select unique rows from a data frame in R:

Method 1: Select Unique Rows Across All Columns

library(dplyr)

df %>% distinct()

Method 2: Select Unique Rows Based on One Column

library(dplyr)

df %>% distinct(column1, .keep_all=TRUE)

Method 3: Select Unique Rows Based on Multiple Columns

library(dplyr)

df %>% distinct(column1, column2, .keep_all=TRUE)

This tutorial explains how to use each method in practice with the following data frame:

#create data frame
df <- data.frame(team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
                 position=c('G', 'G', 'F', 'F', 'G', 'G', 'F', 'F'),
                 points=c(10, 10, 8, 14, 15, 15, 17, 17))

#view data frame
df

  team position points
1    A        G     10
2    A        G     10
3    A        F      8
4    A        F     14
5    B        G     15
6    B        G     15
7    B        F     17
8    B        F     17

Example 1: Select Unique Rows Across All Columns

The following code shows how to select rows that have unique values across all columns in the data frame:

library(dplyr)

#select rows with unique values across all columns
df %>% distinct()

  team position points
1    A        G     10
2    A        F      8
3    A        F     14
4    B        G     15
5    B        F     17

We can see that there are five unique rows in the data frame.

Note: When duplicate rows are encountered, only the first unique row is kept.

Example 2: Select Unique Rows Based on One Column

The following code shows how to select unique rows based on the team column only.

library(dplyr)

#select rows with unique values based on team column only
df %>% distinct(team, .keep_all=TRUE)

  team position points
1    A        G     10
2    B        G     15

Since there are only two unique values in the team column, only the rows with the first occurrence of each value are kept.

Note: The argument .keep_all=TRUE tells R to keep all other columns in the output.

Example 3: Select Unique Rows Based on Multiple Columns

The following code shows how to select unique rows based on the team and position columns only.

library(dplyr)

#select rows with unique values based on team and position columns only
df %>% distinct(team, position, .keep_all=TRUE)

  team position points
1    A        G     10
2    A        F      8
3    B        G     15
4    B        F     17

Four rows are returned, since there are four unique combinations of values across the team and position columns.

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Filter for Unique Values Using dplyr
How to Filter by Multiple Conditions Using dplyr
How to Count Number of Occurrences in Columns in R

Leave a Reply

Your email address will not be published.