R: How to Find Unique Rows Across Multiple Columns


You can use the following methods to find unique rows across multiple columns of a data frame in R:

Method 1: Find Unique Rows Across Multiple Columns (Drop Other Columns)

df_unique <- unique(df[c('col1', 'col2')])

Method 2: Find Unique Rows Across Multiple Columns (Keep Other Columns)

df_unique <- df[!duplicated(df[c('col1', 'col2')]),]

The following examples show how to use each of these methods in practice with the following data frame:

#create data frame
df <- data.frame(conf=c('East', 'East', 'East', 'West', 'West', 'West'),
                 pos=c('G', 'G', 'F', 'G', 'F', 'F'),
                 points=c(33, 28, 31, 39, 34, 40))

#view data frame
df

  conf pos points
1 East   G     33
2 East   G     28
3 East   F     31
4 West   G     39
5 West   F     34
6 West   F     40

Method 1: Find Unique Rows Across Multiple Columns (Drop Other Columns)

The following code shows how to find unique rows across the conf and pos columns in the data frame:

#find unique rows across conf and pos columns
df_unique <- unique(df[c('conf', 'pos')])

#view results
df_unique 

  conf pos
1 East   G
3 East   F
4 West   G
5 West   F

The result is four rows that are all unique.

Also notice that the points column was automatically dropped from the results.

Method 2: Find Unique Rows Across Multiple Columns (Drop Other Columns)

The following code shows how to find unique rows across the conf and pos columns in the data frame and keep the values in the points column:

#find unique rows across conf and pos columns
df_unique <- df[!duplicated(df[c('conf', 'pos')]),]

#view results
df_unique 

  conf pos points
1 East   G     33
3 East   F     31
4 West   G     39
5 West   F     34

Notice that only unique rows exist across the conf and pos columns and the values in the points column are kept.

It’s important to note that only the value for the first unique occurrence is kept.

For example, there were two rows that contained “East” and “G” across the first two columns, but only the points value (33) for the first occurrence of this unique combination was kept in the final data frame.

Similarly, there were two rows that contained “West” and “F” across the first two columns, but only the points value (34) for the first occurrence of this unique combination was kept in the final data frame.

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Find Unique Values in a Column in R
How to Count Unique Values by Group in R
How to Filter for Unique Values Using dplyr

Leave a Reply

Your email address will not be published.