Pandas: How to Compare Two DataFrames Row by Row


You can use the following methods to compare two pandas DataFrames row by row:

Method 1: Compare DataFrames and Only Keep Rows with Differences

df_diff = df1.compare(df2, keep_equal=True, align_axis=0)

Method 2: Compare DataFrames and Keep All Rows

df_diff = df1.compare(df2, keep_equal=True, keep_shape=True, align_axis=0) 

The following examples show how to use each method with the following pandas DataFrames:

import pandas as pd

#create first DataFrame
df1 = pd.DataFrame({'team': ['A', 'B', 'C', 'D'],
                    'points': [18, 22, 19, 14],
                    'assists': [5, 7, 7, 9]})

print(df1)

  team  points  assists
0    A      18        5
1    B      22        7
2    C      19        7
3    D      14        9

#create second DataFrame
df2 = pd.DataFrame({'team': ['A', 'B', 'C', 'E'],
                    'points': [18, 30, 19, 20],
                    'assists': [5, 7, 7, 9]})

print(df2)

  team  points  assists
0    A      18        5
1    B      30        7
2    C      19        7
3    E      20        9

Example 1: Compare DataFrames and Only Keep Rows with Differences

The following code shows how to compare the two DataFrames row by row and only keep the rows that have differences in at least one column:

#compare DataFrames and only keep rows with differences
df_diff = df1.compare(df2, keep_equal=True, align_axis=0)

#view results
print(df_diff)

        team  points
1 self     B      22
  other    B      30
3 self     D      14
  other    E      20

We can see that the DataFrames have two rows that are different.

In particular, we can see that the rows in index positions 1 and 3 of each DataFrame have different values in at least one column.

The values in the self row show the values from the first DataFrame while the values in the other row show the values from the second DataFrame.

For example, we can see:

  • The row in index position 1 of the first DataFrame contains B in the team column and 22 in the points column.
  • The row in index position 1 of the second DataFrame contains B in the team column and 30 in the points column.

Note: The argument keep_equal=True tells pandas to keep values that are equal. Otherwise, equal values are shown as NaNs.

Example 2: Compare DataFrames and Keep All Rows

The following code shows how to use the argument keep_shape=True to compare the two DataFrames row by row and keep all of the rows from the original DataFrames:

#compare DataFrames and keep all rows
df_diff = df1.compare(df2, keep_equal=True, keep_shape=True, align_axis=0)

#view results
print(df_diff)

        team  points  assists
0 self     A      18        5
  other    A      18        5
1 self     B      22        7
  other    B      30        7
2 self     C      19        7
  other    C      19        7
3 self     D      14        9
  other    E      20        9

The resulting DataFrame contains all of the rows and columns from the original DataFrames.

Note #1: The compare() function assumes that both DataFrames have the same dimensions.

Note #2: You can find the complete documentation for the pandas compare() function here.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

Pandas: How to Compare Two Columns
Pandas: How to Compare Three Columns
Pandas: How to Compare Columns in Two Different DataFrames

Featured Posts

Leave a Reply

Your email address will not be published. Required fields are marked *