How to Drop Duplicate Rows in a Pandas DataFrame


The easiest way to drop duplicate rows in a pandas DataFrame is by using the drop_duplicates() function, which uses the following syntax:

df.drop_duplicates(subset=None, keep=’first’, inplace=False)

where:

  • subset: Which columns to consider for identifying duplicates. Default is all columns.
  • keep: Indicates which duplicates (if any) to keep. 
    • first: Delete all duplicate rows except first.
    • last: Delete all duplicate rows except last.
    • False: Delete all duplicates.
  • inplace: Indicates whether to drop duplicates in place or return a copy of the DataFrame.

This tutorial provides several examples of how to use this function in practice on the following DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['a', 'b', 'b', 'c', 'c', 'd'],
                   'points': [3, 7, 7, 8, 8, 9],
                   'assists': [8, 6, 7, 9, 9, 3]})

#display DataFrame
print(df)

  team  points  assists
0    a       3        8
1    b       7        6
2    b       7        7
3    c       8        9
4    c       8        9
5    d       9        3

Example 1: Remove Duplicates Across All Columns

The following code shows how to remove rows that have duplicate values across all columns:

df.drop_duplicates()

        team	points	assists
0	a	3	8
1	b	7	6
2	b	7	7
3	c	8	9
5	d	9	3

By default, the drop_duplicates() function deletes all duplicates except the first.

However, we could use the keep=False argument to delete all duplicates entirely:

df.drop_duplicates(keep=False)

	team	points	assists
0	a	3	8
1	b	7	6
2	b	7	7
5	d	9	3

Example 2: Remove Duplicates Across Specific Columns

The following code shows how to remove rows that have duplicate values across just the columns titled team and points:

df.drop_duplicates(subset=['team', 'points'])

        team	points	assists
0	a	3	8
1	b	7	6
3	c	8	9
5	d	9	3

Additional Resources

How to Sort Values in a Pandas DataFrame
How to Filter a Pandas DataFrame on Multiple Conditions
How to Insert a Column Into a Pandas DataFrame

Leave a Reply

Your email address will not be published. Required fields are marked *