Pandas: How to Compare Strings Between Two Columns


You can use the following basic syntax to compare strings between two columns in a pandas DataFrame:

df['col1'].str.strip().str.lower() == df['col2'].str.strip().str.lower()

The str.strip() function strips the whitespace from each string and the str.lower() function converts each string to lowercase before performing the comparison.

The following example shows how to use this syntax in practice.

Example: Compare Strings Between Two Columns in Pandas

Suppose we have the following pandas DataFrame that contains two columns with basketball team names:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team1': ['Mavs', 'Hawks', 'Nets', 'Hornets', 'Lakers'],
                   'team2': [' Mavs ', 'Jazz', 'Nets', 'Hornets ', 'LAKERS']})

#view DataFrame
print(df)

     team1     team2
0     Mavs     Mavs 
1    Hawks      Jazz
2     Nets      Nets
3  Hornets  Hornets 
4   Lakers    LAKERS

Notice that some team names contain whitespaces in random places and some team names are capitalized.

Suppose that we would like to compare the strings in each row to see if the team names are equal in each row.

If we only use the == symbol when comparing the strings, pandas will only return True if they are the same case and contain the whitespaces in the exact same positions:

#create new column that tests if strings in team columns are equal
df['equal'] = df['team1'] == df['team2']

#view updated DataFrame
print(df)

     team1     team2  equal
0     Mavs     Mavs   False
1    Hawks      Jazz  False
2     Nets      Nets   True
3  Hornets  Hornets   False
4   Lakers    LAKERS  False

Notice that only one row returns True, since this is the only row in which the strings contain the same case and the same whitespaces positions.

However, we can use the str.strip() function to strip the whitespace from each string and the str.lower() function to convert each string to lowercase before performing the comparison:

#remove whitespace and convert each string to lowercase, then compare strings
df['equal'] = df['team1'].str.strip().str.lower()==df['team2'].str.strip().str.lower()

#view updated DataFrame
print(df)

     team1     team2  equal
0     Mavs     Mavs    True
1    Hawks      Jazz  False
2     Nets      Nets   True
3  Hornets  Hornets    True
4   Lakers    LAKERS   True

Now each row returns True except for the row where the team names are “Hawks” and “Jazz” because even after stripping the whitespace and converting the strings to lowercase, these strings are not equal.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

Pandas: How to Remove Specific Characters from Strings
Pandas: Check if String Contains Multiple Substrings
Pandas: How to Replace Empty Strings with NaN

Leave a Reply

Your email address will not be published. Required fields are marked *