How to Use the nunique() Function in Pandas


Often you may want to count the number of unique values in either the rows or columns of a pandas DataFrame.

The easiest way to do so is by using the nunique() function, which uses the following syntax:

DataFrame.nunique(axis=0, dropna=True)

where:

  • axis: The axis to use (0=row-wise, 1=column-wise)
  • dropna: Whether to include NaN in the counts or not

The following example shows how to use the nunique() function in practice with a pandas DataFrame.

Example: How to Use the nunique() Function in Pandas

Suppose we create the following pandas DataFrame that contains information about various basketball players:

import pandas as pd
import numpy as np

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'B', 'B', 'C', 'C', 'C', 'D'],
                   'points': [12, 12, 18, 13, np.nan, np.nan, 12, 29],
                   'assists': [10, 22, 24, 20, 14, 18, 10, 12]})

#view DataFrame
print(df)

  team  points  assists
0    A    12.0       10
1    A    12.0       22
2    B    18.0       24
3    B    13.0       20
4    C     NaN       14
5    C     NaN       18
6    C    12.0       10
7    D    29.0       12

Suppose that we would like to count the number of unique values in each column of the DataFrame.

We can use the following syntax to do so:

#count number of unique values in each column of DataFrame
df.nunique()

team       4
points     4
assists    7
dtype: int64

From the output we can see:

  • The team column has  4 unique values.
  • The points column has 4 unique values.
  • The assists column has 7 unique values.

By default, NaN values are not included in the unique counts.

However, we can specify dropna=False within the nunique() function to include the NaN values in the unique counts.

The following syntax shows how to do so:

#count number of unique values in each column, including NaN in counts
df.nunique(dropna=False)

team       4
points     5
assists    7
dtype: int64

Notice that the points column now shows 5 unique values instead of 4 because we counted the NaN values in the points column as a unique value.

If you’d like, you can also count the number of unique values in just one specific column of the DataFrame.

For example, we can use the following syntax to count the number of unique values only in the points column of the DataFrame:

#count number of unique values in points column only
df['points'].nunique(dropna=False)

5

This returns a value of 5, which represents the number of unique values just in the points column of the DataFrame.

If we’d like, we can also specify axis=1 within the nunique() function to instead count the number of unique values in each row as opposed to each column.

We can use the following syntax to do so:

#count number of unique values in each row of DataFrame
df.nunique(axis=1)

0    3
1    3
2    3
3    3
4    2
5    2
6    3
7    3
dtype: int64

The output displays the number of unique values in each row of the pandas DataFrame.

For example, we can see:

  • The first row has 3 unique values.
  • The second row has 3 unique values.
  • The third row has 3 unique values.

And so on.

Note: You can find the complete documentation for the nunique() function in pandas here.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

How to Use qcut() in Pandas
How to Use pct_change() in Pandas
How to Create Frequency Table Based on Multiple Columns in Pandas

Featured Posts

Leave a Reply

Your email address will not be published. Required fields are marked *