How to Find Unique Values in Multiple Columns in Pandas


Often you may be interested in finding all of the unique values across multiple columns in a pandas DataFrame. Fortunately this is easy to do using the pandas unique() function combined with the ravel() function:

  • unique(): Returns unique values in order of appearance.
  • ravel(): Returns a flattened data series.

For example, suppose we have the following pandas DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'col1': ['a', 'b', 'c', 'd', 'e'],
                   'col2': ['a', 'c', 'e', 'f', 'g'],
                   'col3': [11, 8, 10, 6, 6]})

#view DataFrame 
print(df)

  col1 col2  col3
0    a    a    11
1    b    c     8
2    c    e    10
3    d    f     6
4    e    g     6

Return Array of Unique Values

The following code shows how to find the unique values in col1 and col2:

pd.unique(df[['col1', 'col2']].values.ravel())

array(['a', 'b', 'c', 'e', 'd', 'f', 'g'], dtype=object)

From the output we can see that there are unique values across these two columns: a, b, c, d, e, f, g.

Return DataFrame of Unique Values

If you’d like to return these values as a DataFrame instead of an array, you can use the following code:

uniques = pd.unique(df[['col1', 'col2']].values.ravel())

pd.DataFrame(uniques)

	0
0	a
1	b
2	c
3	e
4	d
5	f
6	g

Return Number of Unique Values

If you simply want to know the number of unique values across multiple columns, you can use the following code:

uniques = pd.unique(df[['col1', 'col2']].values.ravel())

len(uniques)
7

This tell us that there are unique values across these two columns.

Additional Resources

How to Merge Pandas DataFrames on Multiple Columns
How to Filter a Pandas DataFrame on Multiple Conditions

Leave a Reply

Your email address will not be published.