You can use the following syntax to convert a string column to a date column in a PySpark DataFrame:
from pyspark.sql import functions as F
df = df.withColumn('my_date_column', F.to_date('my_date_column'))
This particular example converts the values in the my_date_column from strings to dates.
The following example shows how to use this syntax in practice.
Example: How to Convert String to Date in PySpark
Suppose we have the following PySpark DataFrame that contains information about sales made on various dates at some company:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
#define data
data = [['2023-01-15', 225],
['2023-02-24', 260],
['2023-07-14', 413],
['2023-10-30', 368]]
#define column names
columns = ['date', 'sales']
#create dataframe using data and column names
df = spark.createDataFrame(data, columns)
#view dataframe
df.show()
+----------+-----+
| date|sales|
+----------+-----+
|2023-01-15| 225|
|2023-02-24| 260|
|2023-07-14| 413|
|2023-10-30| 368|
+----------+-----+
We can use the following syntax to display the data type of each column in the DataFrame:
#check data type of each column
df.dtypes
[('date', 'string'), ('sales', 'bigint')]
We can see that the date column currently has a data type of string.
To convert this column from a string to a date, we can use the following syntax:
from pyspark.sql import functions as F
#convert 'date' column from string to date
df = df.withColumn('date', F.to_date('date'))
#view updated DataFrame
df.show()
+----------+-----+
| date|sales|
+----------+-----+
|2023-01-15| 225|
|2023-02-24| 260|
|2023-07-14| 413|
|2023-10-30| 368|
+----------+-----+
We can use the dtypes function once again to view the data types of each column in the DataFrame:
#check data type of each column
df.dtypes
[('date', 'date'), ('sales', 'bigint')]
We can see that the date column now has a data type of date.
We have successfully converted a string column to a date column.
Additional Resources
The following tutorials explain how to perform other common tasks in PySpark:
PySpark: How to Select Columns with Alias
PySpark: How to Select Columns by Index
PySpark: How to Select Multiple Columns