You can use the following basic syntax to compare strings between two columns in a pandas DataFrame:
df['col1'].str.strip().str.lower() == df['col2'].str.strip().str.lower()
The str.strip() function strips the whitespace from each string and the str.lower() function converts each string to lowercase before performing the comparison.
The following example shows how to use this syntax in practice.
Example: Compare Strings Between Two Columns in Pandas
Suppose we have the following pandas DataFrame that contains two columns with basketball team names:
import pandas as pd #create DataFrame df = pd.DataFrame({'team1': ['Mavs', 'Hawks', 'Nets', 'Hornets', 'Lakers'], 'team2': [' Mavs ', 'Jazz', 'Nets', 'Hornets ', 'LAKERS']}) #view DataFrame print(df) team1 team2 0 Mavs Mavs 1 Hawks Jazz 2 Nets Nets 3 Hornets Hornets 4 Lakers LAKERS
Notice that some team names contain whitespaces in random places and some team names are capitalized.
Suppose that we would like to compare the strings in each row to see if the team names are equal in each row.
If we only use the == symbol when comparing the strings, pandas will only return True if they are the same case and contain the whitespaces in the exact same positions:
#create new column that tests if strings in team columns are equal df['equal'] = df['team1'] == df['team2'] #view updated DataFrame print(df) team1 team2 equal 0 Mavs Mavs False 1 Hawks Jazz False 2 Nets Nets True 3 Hornets Hornets False 4 Lakers LAKERS False
Notice that only one row returns True, since this is the only row in which the strings contain the same case and the same whitespaces positions.
However, we can use the str.strip() function to strip the whitespace from each string and the str.lower() function to convert each string to lowercase before performing the comparison:
#remove whitespace and convert each string to lowercase, then compare strings df['equal'] = df['team1'].str.strip().str.lower()==df['team2'].str.strip().str.lower() #view updated DataFrame print(df) team1 team2 equal 0 Mavs Mavs True 1 Hawks Jazz False 2 Nets Nets True 3 Hornets Hornets True 4 Lakers LAKERS True
Now each row returns True except for the row where the team names are “Hawks” and “Jazz” because even after stripping the whitespace and converting the strings to lowercase, these strings are not equal.
Additional Resources
The following tutorials explain how to perform other common tasks in pandas:
Pandas: How to Remove Specific Characters from Strings
Pandas: Check if String Contains Multiple Substrings
Pandas: How to Replace Empty Strings with NaN