How to Perform an Anti-Join in Pandas


An anti-join allows you to return all rows in one dataset that do not have matching values in another dataset.

You can use the following syntax to perform an anti-join between two pandas DataFrames:

outer = df1.merge(df2, how='outer', indicator=True)

anti_join = outer[(outer._merge=='left_only')].drop('_merge', axis=1)

The following example shows how to use this syntax in practice.

Example: Perform an Anti-Join in Pandas

Suppose we have the following two pandas DataFrames:

import pandas as pd

#create first DataFrame
df1 = pd.DataFrame({'team': ['A', 'B', 'C', 'D', 'E'],
                    'points': [18, 22, 19, 14, 30]})

print(df1)

  team  points
0    A      18
1    B      22
2    C      19
3    D      14
4    E      30

#create second DataFrame
df2 = pd.DataFrame({'team': ['A', 'B', 'C', 'F', 'G'],
                    'points': [18, 22, 19, 22, 29]})

print(df2)

  team  points
0    A      18
1    B      22
2    C      19
3    F      22
4    G      29

We can use the following code to return all rows in the first DataFrame that do not have a matching team in the second DataFrame:

#perform outer join
outer = df1.merge(df2, how='outer', indicator=True)

#perform anti-join
anti_join = outer[(outer._merge=='left_only')].drop('_merge', axis=1)

#view results
print(anti_join)

  team  points
3    D      14
4    E      30

We can see that there are exactly two teams from the first DataFrame that do not have a matching team name in the second DataFrame.

The anti-join worked as expected.

The end result is one DataFrame that only contains the rows where the team name belongs to the first DataFrame but not the second DataFrame.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

How to Do an Inner Join in Pandas
How to Do a Left Join in Pandas
How to Do a Cross Join in Pandas

Leave a Reply

Your email address will not be published.