You can use the following syntax to convert epoch time to a recognizable datetime in PySpark:
from pyspark.sql import functions as f
from pyspark.sql import types as t
df.withColumn('datetime', f.to_timestamp(df.epoch.cast(dataType=t.TimestampType())))
This particular example creates a new column called datetime that converts the epoch time from the epoch column to a recognizable datetime format.
For example, this syntax will convert an epoch time of 1655439422 to a PySpark datetime of 2022-06-17 00:17:02.
The following example shows how to use this syntax in practice.
Example: How to Convert Epoch to Datetime in PySpark
Suppose we have the following PySpark DataFrame that contains information about sales made on various epoch times at some company:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
#define data
data = [[1655439422, 18],
[1655638422, 33],
[1664799422, 12],
[1668439411, 15],
[1669939422, 19],
[1669993948, 24]]
#define column names
columns = ['epoch', 'sales']
#create dataframe using data and column names
df = spark.createDataFrame(data, columns)
#view dataframe
df.show()
+----------+-----+
| epoch|sales|
+----------+-----+
|1655439422| 18|
|1655638422| 33|
|1664799422| 12|
|1668439411| 15|
|1669939422| 19|
|1669993948| 24|
+----------+-----+
We can use the following syntax to create a new DataFrame that contains a column called datetime that converts each time in the epoch column to a recognizable datetime format:
from pyspark.sql import functions as f from pyspark.sql import types as t #create new column called 'epoch' that converts epoch to datetime df_new = df.withColumn('datetime', f.to_timestamp(df.epoch.cast(dataType=t.TimestampType()))) #view new DataFrame df_new.show() +----------+-----+-------------------+ | epoch|sales| datetime| +----------+-----+-------------------+ |1655439422| 18|2022-06-17 00:17:02| |1655638422| 33|2022-06-19 07:33:42| |1664799422| 12|2022-10-03 08:17:02| |1668439411| 15|2022-11-14 10:23:31| |1669939422| 19|2022-12-01 19:03:42| |1669993948| 24|2022-12-02 10:12:28| +----------+-----+-------------------+
Notice that the values in the datetime column contain recognizable datetimes.
For example:
- The epoch time 1655439422 is equivalent to 2022-06-07 00:17:02.
- The epoch time 1655638422 is equivalent to 2022-06-19 07:33:42.
- The epoch time 1664799422 is equivalent to 2022-10-03 08:17:02.
And so on.
Note: PySpark automatically displays datetimes in the local timezone based on your machine.
Additional Resources
The following tutorials explain how to perform other common tasks in PySpark:
PySpark: Create Date Column from Year, Month and Day
PySpark: How to Calculate a Difference Between Two Dates
PySpark: How to Convert Timestamp to Date