You can use the following basic syntax to drop duplicates from a pandas DataFrame but keep the row with the latest timestamp:
df = df.sort_values('time').drop_duplicates(['item'], keep='last')
This particular example drops rows with duplicate values in the item column, but keeps the row with the latest timestamp in the time column.
The following example shows how to use this syntax in practice.
Example: Drop Duplicates and Keep Latest in Pandas
Suppose we have the following pandas DataFrame that contains information about the sales of various fruits at some grocery store:
import pandas as pd #create DataFrame df = pd.DataFrame({'time': ['2022-10-25 04:00:00', '2022-10-25 11:55:12', '2022-10-26 02:00:00', '2022-10-27 10:30:00', '2022-10-27 14:25:00', '2022-10-28 01:15:27'], 'item': ['apple', 'orange', 'apple', 'mango', 'mango', 'kiwi'], 'sales': [18, 22, 19, 14, 14, 11]}) #convert time column to datetime dtype df['time'] = pd.to_datetime(df['time']) #view DataFrame print(df) time item sales 0 2022-10-25 04:00:00 apple 18 1 2022-10-25 11:55:12 orange 22 2 2022-10-26 02:00:00 apple 19 3 2022-10-27 10:30:00 mango 14 4 2022-10-27 14:25:00 mango 14 5 2022-10-28 01:15:27 kiwi 11
Suppose we would like to remove all rows with duplicate values in the item column but keep the row with the latest timestamp in the time column.
We can use the following syntax to do so:
#drop duplicate rows based on value in 'item' column but keep latest timestamp df = df.sort_values('time').drop_duplicates(['item'], keep='last') #view updated DataFrame print(df) time item sales 1 2022-10-25 11:55:12 orange 22 2 2022-10-26 02:00:00 apple 19 4 2022-10-27 14:25:00 mango 14 5 2022-10-28 01:15:27 kiwi 11
Notice that the item column had multiple rows with ‘apple’ and ‘mango’ as values.
Each of these duplicate rows were removed but the row with the latest timestamp in the time column was kept.
Note: If you would like to remove rows based on duplicate values in multiple columns, simply include multiple column names in the first argument of the drop_duplicates() function.
Additional Resources
The following tutorials explain how to perform other common tasks in pandas:
How to Convert Datetime to Date in Pandas
How to Convert Columns to DateTime in Pandas
How to Sort a Pandas DataFrame by Date