You can use the following syntax to add time to a datetime in PySpark:
from pyspark.sql import functions as F
df = df.withColumn('new_ts', df.ts + F.expr('INTERVAL 3 HOURS 5 MINUTES 2 SECONDS'))
This particular example creates a new column called new_ts that adds 3 hours, 5 minutes and 2 seconds to each datetime in the ts column.
Note: To subtract time, simply use a subtraction sign instead of an addition sign.
The following example shows how to use this syntax in practice.
Example: How to Add Time to Datetime in PySpark
Suppose we have the following PySpark DataFrame that contains information about sales made on various timestamps at some company:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
from pyspark.sql import functions as F
#define data
data = [['2023-01-15 04:14:22', 225],
['2023-02-24 10:55:01', 260],
['2023-07-14 18:34:59', 413],
['2023-10-30 22:20:05', 368]]
#define column names
columns = ['ts', 'sales']
#create dataframe using data and column names
df = spark.createDataFrame(data, columns)
#convert string column to timestamp
df = df.withColumn('ts', F.to_timestamp('ts', 'yyyy-MM-dd HH:mm:ss'))
#view DataFrame
df.show()
+-------------------+-----+
| ts|sales|
+-------------------+-----+
|2023-01-15 04:14:22| 225|
|2023-02-24 10:55:01| 260|
|2023-07-14 18:34:59| 413|
|2023-10-30 22:20:05| 368|
+-------------------+-----+
We can use the following syntax to create a new column called new_time that adds 3 hours, 5 minutes and 2 seconds to each datetime in the ts column of the DataFrame:
from pyspark.sql import functions as F
#add 3 hours, 5 minutes and 2 seconds to each datetime in 'ts' column
df = df.withColumn('new_time', df.ts + F.expr('INTERVAL 3 HOURS 5 MINUTES 2 SECONDS'))
#view updated DataFrame
df.show()
+-------------------+-----+-------------------+
| ts|sales| new_time|
+-------------------+-----+-------------------+
|2023-01-15 04:14:22| 225|2023-01-15 07:19:24|
|2023-02-24 10:55:01| 260|2023-02-24 14:00:03|
|2023-07-14 18:34:59| 413|2023-07-14 21:40:01|
|2023-10-30 22:20:05| 368|2023-10-31 01:25:07|
+-------------------+-----+-------------------+
The new_time column shows each time from the ts column with 3 hours, 5 minutes and 2 seconds added to it.
Note that you could also add only hours if you’d like by using the following syntax:
from pyspark.sql import functions as F
#add 3 hours to each datetime in 'ts' column
df = df.withColumn('new_time', df.ts + F.expr('INTERVAL 3 HOURS'))
#view updated DataFrame
df.show()
+-------------------+-----+-------------------+
| ts|sales| new_time|
+-------------------+-----+-------------------+
|2023-01-15 04:14:22| 225|2023-01-15 07:14:22|
|2023-02-24 10:55:01| 260|2023-02-24 13:55:01|
|2023-07-14 18:34:59| 413|2023-07-14 21:34:59|
|2023-10-30 22:20:05| 368|2023-10-31 01:20:05|
+-------------------+-----+-------------------+
Now the new_time column shows each time from the ts column with 3 hours added to it.
Feel free to use this syntax to add or subtract as much time as you’d like from a datetime column.
Additional Resources
The following tutorials explain how to perform other common tasks in PySpark:
How to Convert Timestamp to Date in PySpark
How to Convert String to Timestamp in PySpark
How to Calculate Difference Between Two Times in PySpark