dplyr: How to Use anti_join to Find Unmatched Records


You can use the anti_join() function from the dplyr package in R to return all rows in one data frame that do not have matching values in another data frame.

This function uses the following basic syntax:

anti_join(df1, df2, by='col_name')

The following examples show how to use this syntax in practice.

Example 1: Use anti_join() with One Column

Suppose we have the following two data frames in R:

#create data frames
df1 <- data.frame(team=c('A', 'B', 'C', 'D', 'E'),
                  points=c(12, 14, 19, 24, 36))

df2 <- data.frame(team=c('A', 'B', 'C', 'F', 'G'),
                  points=c(12, 14, 19, 33, 17))

We can use the anti_join() function to return all rows in the first data frame that do not have a matching team in the second data frame:

library(dplyr)

#perform anti join using 'team' column
anti_join(df1, df2, by='team')

  team points
1    D     24
2    E     36

We can see that there are exactly two teams from the first data frame that do not have a matching team name in the second data frame.

Example 2: Use anti_join() with Multiple Columns

Suppose we have the following two data frames in R:

#create data frames
df1 <- data.frame(team=c('A', 'A', 'A', 'B', 'B', 'B'),
                  position=c('G', 'G', 'F', 'G', 'F', 'C'),
                  points=c(12, 14, 19, 24, 36, 41))

df2 <- data.frame(team=c('A', 'A', 'A', 'B', 'B', 'B'),
                  position=c('G', 'G', 'C', 'G', 'F', 'F'),
                  points=c(12, 14, 19, 33, 17, 22))

We can use the anti_join() function to return all rows in the first data frame that do not have a matching team and position in the second data frame:

library(dplyr)

#perform anti join using 'team' and 'position' columns
anti_join(df1, df2, by=c('team', 'position'))

  team position points
1    A        F     19
2    B        C     41

We can see that there are exactly two records from the first data frame that do not have a matching team name and position in the second data frame.

Additional Resources

The following tutorials explain how to perform other common functions in dplyr:

How to Select Columns by Index Using dplyr
How to Join Multiple Data Frames Using dplyr
How to Filter Rows that Contain a Certain String Using dplyr

Leave a Reply

Your email address will not be published.