How to Compare Two Columns in Pandas (With Examples)


Often you may want to compare two columns in a Pandas DataFrame and write the results of the comparison to a third column.

You can easily do this by using the following syntax:

conditions=[(condition1),(condition2)]
choices=["choice1","choice2"]

df["new_column_name"]=np.select(conditions, choices, default)

Here’s what this code does:

  • conditions are the conditions to check for between the two columns
  • choices are the results to return based on the conditions
  • np.select is used to return the results to the new column

The following example shows how to use this code in practice.

Example: Compare Two Columns in Pandas

Suppose we have the following DataFrame that shows the number of goals scored by two soccer teams in five different matches:

import numpy as np
import pandas as pd

#create DataFrame
df = pd.DataFrame({'A_points': [1, 3, 3, 3, 5],
                   'B_points': [4, 5, 2, 3, 2]})
             
#view DataFrame      
df

          A_points  B_points
0         1         4
1         3         5
2         3         2
3         3         3
4         5         2

We can use the following code to compare the number of goals by row and output the winner of the match in a third column:

#define conditions
conditions = [df['A_points'] > df['B_points'], 
              df['A_points'] < df['B_points']]

#define choices
choices = ['A', 'B']

#create new column in DataFrame that displays results of comparisons
df['winner'] = np.select(conditions, choices, default='Tie')

#view the DataFrame
df

          A_points  B_points  winner
0         1         4         B
1         3         5         B
2         3         2         A
3         3         3         Tie
4         5         2         A

The results of the comparison are shown in the new column called winner.

Notes

Here are a few things to keep in mind when comparing two columns in a pandas DataFrame:

  • The number of conditions and choices should be equal.
  • The default value specifies the value to display in the new column if none of the conditions are met.
  • Both NumPy and Pandas are required to make this code work.

You can find more Python tutorials here.

Leave a Reply

Your email address will not be published. Required fields are marked *