How to Count Duplicates in Pandas (With Examples)


You can use the following methods to count duplicates in a pandas DataFrame:

Method 1: Count Duplicate Values in One Column

len(df['my_column'])-len(df['my_column'].drop_duplicates())

Method 2: Count Duplicate Rows

len(df)-len(df.drop_duplicates())

Method 3: Count Duplicates for Each Unique Row

df.groupby(df.columns.tolist(), as_index=False).size()

The following examples show how to use each method in practice with the following pandas DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
                   'position': ['G', 'G', 'G', 'F', 'G', 'G', 'F', 'F'],
                   'points': [5, 5, 8, 10, 5, 7, 10, 10]})

#view DataFrame
print(df)

  team position  points
0    A        G       5
1    A        G       5
2    A        G       8
3    A        F      10
4    B        G       5
5    B        G       7
6    B        F      10
7    B        F      10

Example 1: Count Duplicate Values in One Column

The following code shows how to count the number of duplicate values in the points column:

#count duplicate values in points column
len(df['points'])-len(df['points'].drop_duplicates())

4

We can see that there are 4 duplicate values in the points column.

Example 2: Count Duplicate Rows

The following code shows how to count the number of duplicate rows in the DataFrame:

#count number of duplicate rows
len(df)-len(df.drop_duplicates())

2

We can see that there are 2 duplicate rows in the DataFrame.

We can use the following syntax to view these 2 duplicate rows:

#display duplicated rows
df[df.duplicated()]

        team	position points
1	A	G	 5
7	B	F	 10

Example 3: Count Duplicates for Each Unique Row

The following code shows how to count the number of duplicates for each unique row in the DataFrame:

#display number of duplicates for each unique row
df.groupby(df.columns.tolist(), as_index=False).size()

        team	position points	size
0	A	F	 10	1
1	A	G	 5	2
2	A	G	 8	1
3	B	F	 10	2
4	B	G	 5	1
5	B	G	 7	1

The size column displays the number of duplicates for each unique row.

Additional Resources

The following tutorials explain how to perform other common operations in pandas:

How to Drop Duplicate Rows in Pandas
How to Drop Duplicate Columns in Pandas
How to Select Columns by Index in Pandas

Leave a Reply

Your email address will not be published.