You can use the following function to calculate a timedelta in months between two columns of a pandas DataFrame:
def month_diff(x, y): end = x.dt.to_period('M').view(dtype='int64') start = y.dt.to_period('M').view(dtype='int64') return end-start
The following example shows how to use this function in practice.
Example: Calculate Timedelta in Months in Pandas
Suppose we have the following pandas DataFrame:
import pandas as pd #create DataFrame df = pd.DataFrame({'event': ['A', 'B', 'C'], 'start_date': ['20210101', '20210201', '20210401'], 'end_date': ['20210608', '20210209', '20210801'] }) #convert start date and end date columns to datetime df['start_date'] = pd.to_datetime(df['start_date']) df['end_date'] = pd.to_datetime(df['end_date']) #view DataFrame print(df) event start_date end_date 0 A 2021-01-01 2021-06-08 1 B 2021-02-01 2021-02-09 2 C 2021-04-01 2021-08-01
Now suppose we’d like to calculate the timedelta (in months) between the start_date and end_date columns.
To do so, we’ll first define the following function:
#define function to calculate timedelta in months between two columns def month_diff(x, y): end = x.dt.to_period('M').view(dtype='int64') start = y.dt.to_period('M').view(dtype='int64') return end-start
Next, we’ll use this function to calculate the timedelta in months between the start_date and end_date columns:
#calculate month difference between start date and end date columns
df['month_difference'] = month_diff(df.end_date, df.start_date)
#view updated DataFrame
df
event start_date end_date month_difference
0 A 2021-01-01 2021-06-08 5
1 B 2021-02-01 2021-02-09 0
2 C 2021-04-01 2021-08-01 4
The month_difference column displays the timedelta (in months) between the start_date and end_date columns.
Additional Resources
The following tutorials explain how to perform other common operations in pandas:
How to Convert Columns to DateTime in Pandas
How to Convert Datetime to Date in Pandas
How to Extract Month from Date in Pandas