You can use the pandas.to_datetime() function to convert a string column to a datetime column in a pandas DataFrame.
When using this function, you can use the format argument to specify the format that your date is in so that you avoid errors when converting it from string to datetime.
This function uses the following basic syntax:
df['datetime'] = pd.to_datetime(df['my_date_column'], format='%m%d%Y %H:%M:%S'])
Here are the most common directives that you can provide to the format argument:
- %m: Month as zero-padded number (01, 02, … 12)
- %d: Day of the month as zero-padded number (01, 02, … 31)
- %y: Year with century as number (2020, 2021, 2022, …)
- %H: Hour (24-hour clock) as zero-padded number (00, 01, … 23)
- %I: Hour (12-hour clock) as zero-padded number (01, 02, … 12)
- %p: Either AM or PM
- %M: Minute as zero-padded number (00, 01, … 59)
- %S: Second as zero-padded number (00, 01, … 59)
For a complete list of directives, refer to this page.
The following example shows how to use the format argument within the to_datetime() function in different scenarios.
Example: Specify Format in pandas.to_datetime
Suppose we have the following pandas DataFrame that contains information about total sales made on various dates at some retail store:
import pandas as pd #create DataFrame df = pd.DataFrame({'date': ['10012023 4:15:30', '10042023 7:16:04','10062023 9:25:00', '10142023 15:30:50', '10152023 18:15:00'], 'sales': [100, 140, 235, 120, 250]}) #view DataFrame print(df) date sales 0 10012023 4:15:30 100 1 10042023 7:16:04 140 2 10062023 9:25:00 235 3 10142023 15:30:50 120 4 10152023 18:15:00 250 #view data type of each column in DataFrame print(df.dtypes) date object sales int64 dtype: object
We can see that the date column is currently a string (i.e. object) column.
Suppose we attempt to use pandas.to_datetime() to convert this column to datetime:
#attempt to convert date column to datetime format
df['date'] = pd.to_datetime(df['date'])
ParserError: month must be in 1..12: 10012023 4:15:30 present at position 0
We receive an error because the pandas.to_datetime() function doesn’t recognize the date and time format that the date column is currently in.
We can also use the format argument to specify the format of the column:
#convert date column to datetime format
df['date'] = pd.to_datetime(df['date'], format='%m%d%Y %H:%M:%S')
#view DataFrame
print(df)
date sales
0 2023-10-01 04:15:30 100
1 2023-10-04 07:16:04 140
2 2023-10-06 09:25:00 235
3 2023-10-14 15:30:50 120
4 2023-10-15 18:15:00 250
#view updated type of each column
print(df.dtypes)
date datetime64[ns]
sales int64
dtype: object
We can see that the date column has been successfully converted to a datetime column and we didn’t receive any error since we used the format argument to specify the exact format the date column was in.
Note: You can find the complete documentation for the pandas to_datetime() function here.
Additional Resources
The following tutorials explain how to perform other common operations in pandas:
How to Create a Date Range in Pandas
How to Convert Timestamp to Datetime in Pandas
How to Calculate a Difference Between Two Dates in Pandas