You can use the following syntax to convert a column from a date to a string in PySpark:
from pyspark.sql.functions import date_format
df_new = df.withColumn('date_string', date_format('date', 'MM/dd/yyyy'))
This particular example converts the dates in the date column to strings in a new column called date_string, using MM/dd/yyyy as the date format.
The following example shows how to use this syntax in practice.
Example: How to Convert Column from Date to String in PySpark
Suppose we have the following PySpark DataFrame that contains information about sales made on various days for some company:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
import datetime
#define data
data = [[datetime.date(2023, 10, 30), 136],
[datetime.date(2023, 11, 14), 223],
[datetime.date(2023, 11, 22), 450],
[datetime.date(2023, 11, 25), 290],
[datetime.date(2023, 12, 19), 189]]
#define column names
columns = ['date', 'sales']
#create dataframe using data and column names
df = spark.createDataFrame(data, columns)
#view dataframe with full column content
df.show()
+----------+-----+
| date|sales|
+----------+-----+
|2023-10-30| 136|
|2023-11-14| 223|
|2023-11-22| 450|
|2023-11-25| 290|
|2023-12-19| 189|
+----------+-----+
We can use the dtypes function to check the data type of each column in the DataFrame:
#check data type of each column
df.dtypes
[('date', 'date'), ('sales', 'bigint')]
We can see that the date column currently has a data type of date.
To convert this column from a date to a string, we can use the following syntax:
from pyspark.sql.functions import date_format
#create new column that converts dates to strings
df_new = df.withColumn('date_string', date_format('date', 'MM/dd/yyyy'))
#view new DataFrame
df_new.show()
+----------+-----+-----------+
| date|sales|date_string|
+----------+-----+-----------+
|2023-10-30| 136| 10/30/2023|
|2023-11-14| 223| 11/14/2023|
|2023-11-22| 450| 11/22/2023|
|2023-11-25| 290| 11/25/2023|
|2023-12-19| 189| 12/19/2023|
+----------+-----+-----------+
We can use the dtypes function once again to view the data types of each column in the DataFrame:
#check data type of each column
df.dtypes
[('date', 'date'), ('sales', 'bigint'), ('date_string', 'string')]
We can see that the date_string column has a data type of string.
We have successfully created a string column from a date column.
Note: We used MM/dd/yyyy as the date format within the date_format function but feel free to use whatever date format you’d like.
Additional Resources
The following tutorials explain how to perform other common tasks in PySpark:
How to Convert String to Integer in PySpark
How to Convert String to Date in PySpark
How to Convert String to Timestamp in PySpark