# PySpark: How to Calculate Difference Between Two Times

You can use the following syntax to calculate a difference between two times in a PySpark DataFrame:

```from pyspark.sql.functions import col

df_new = df.withColumn('seconds_diff', col('end_time').cast('long') - col('start_time').cast('long'))\
.withColumn('minutes_diff', (col('end_time').cast('long') - col('start_time').cast('long'))/60)\
.withColumn('hours_diff', (col('end_time').cast('long') - col('start_time').cast('long'))/3600)
```

This particular example calculates the difference between the times in the start_time and end_time columns in a DataFrame in terms of seconds, minutes and hours.

The following example shows how to use this syntax in practice.

## Example: How to Calculate Difference Between Two Times in PySpark

Suppose we have the following PySpark DataFrame that contains a column of start times for some activity and a column of end times:

```from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

from pyspark.sql import functions as F

#define data
data = [['2023-01-15 04:14:22', '2023-01-18 04:15:00'],
['2023-02-24 10:55:01', '2023-02-24 11:14:30'],
['2023-07-14 18:34:59', '2023-07-14 18:35:22'],
['2023-10-30 22:20:05', '2023-11-02 07:55:00']]

#define column names
columns = ['start_time', 'end_time']

#create dataframe using data and column names
df = spark.createDataFrame(data, columns)

#convert string columns to timestamp
df = df.withColumn('start_time', F.to_timestamp('start_time', 'yyyy-MM-dd HH:mm:ss'))\
.withColumn('end_time', F.to_timestamp('end_time', 'yyyy-MM-dd HH:mm:ss'))

#view DataFrame
df.show()

+-------------------+-------------------+
|         start_time|           end_time|
+-------------------+-------------------+
|2023-01-15 04:14:22|2023-01-18 04:15:00|
|2023-02-24 10:55:01|2023-02-24 11:14:30|
|2023-07-14 18:34:59|2023-07-14 18:35:22|
|2023-10-30 22:20:05|2023-11-02 07:55:00|
+-------------------+-------------------+
```

We can use the following syntax to create a new DataFrame that contains three new columns that display the time difference between each start and end time in terms of seconds, minutes and hours:

```from pyspark.sql.functions import col
```
```#create new DataFrame with time differences
df_new = df.withColumn('seconds_diff', col('end_time').cast('long') - col('start_time').cast('long'))\
.withColumn('minutes_diff', (col('end_time').cast('long') - col('start_time').cast('long'))/60)\
.withColumn('hours_diff', (col('end_time').cast('long') - col('start_time').cast('long'))/3600)

#view new DataFrame
df_new.show()

+-------------------+-------------------+------------+-------------------+--------------------+
|         start_time|           end_time|seconds_diff|       minutes_diff|          hours_diff|
+-------------------+-------------------+------------+-------------------+--------------------+
|2023-01-15 04:14:22|2023-01-18 04:15:00|      259238|  4320.633333333333|   72.01055555555556|
|2023-02-24 10:55:01|2023-02-24 11:14:30|        1169| 19.483333333333334| 0.32472222222222225|
|2023-07-14 18:34:59|2023-07-14 18:35:22|          23|0.38333333333333336|0.006388888888888889|
|2023-10-30 22:20:05|2023-11-02 07:55:00|      207295| 3454.9166666666665|  57.581944444444446|
+-------------------+-------------------+------------+-------------------+--------------------+
```

The resulting DataFrame contains the following three new columns:

• seconds_diff: The difference between each start and end time in seconds.
• minutes_diff: The difference between each start and end time in minutes.
• hours_diff: The difference between each start and end time in hours.

Note that we used the withColumn function three times to return a new DataFrame with three columns added to the existing DataFrame.

You can find the complete documentation for the PySpark withColumn function here.