You can use the following methods to extract the minutes from a timestamp in PySpark:
Method 1: Extract Minutes from Timestamp
from pyspark.sql import functions as F
df_new = df.withColumn('minutes', F.minute(df['ts']))
If the timestamp is 2023-01-15 04:14:22 then this syntax would return 14.
Method 2: Extract Timestamp Truncated to Minutes
from pyspark.sql import functions as F
df_new = df.withColumn('minutes', F.date_trunc('minute', df['ts']))
If the timestamp is 2023-01-15 04:14:22 then this syntax would return 2023-01-15 04:14:00.
The following example shows how to use each method in practice with the following PySpark DataFrame that contains information about sales made on various timestamps at some company:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
from pyspark.sql import functions as F
#define data
data = [['2023-01-15 04:14:22', 225],
['2023-02-24 10:55:01', 260],
['2023-07-14 18:34:59', 413],
['2023-10-30 22:20:05', 368]]
#define column names
columns = ['ts', 'sales']
#create dataframe using data and column names
df = spark.createDataFrame(data, columns)
#convert string column to timestamp
df = df.withColumn('ts', F.to_timestamp('ts', 'yyyy-MM-dd HH:mm:ss'))
#view dataframe
df.show()
+-------------------+-----+
| ts|sales|
+-------------------+-----+
|2023-01-15 04:14:22| 225|
|2023-02-24 10:55:01| 260|
|2023-07-14 18:34:59| 413|
|2023-10-30 22:20:05| 368|
+-------------------+-----+
Example 1: Extract Minutes from Timestamp
We can use the following syntax to extract only the minutes from each timestamp in the ts column of the DataFrame:
from pyspark.sql import functions as F
#extract minutes from each timestamp in 'ts' column
df_new = df.withColumn('minutes', F.minute(df['ts']))
#view new DataFrame
df_new.show()
+-------------------+-----+-------+
| ts|sales|minutes|
+-------------------+-----+-------+
|2023-01-15 04:14:22| 225| 14|
|2023-02-24 10:55:01| 260| 55|
|2023-07-14 18:34:59| 413| 34|
|2023-10-30 22:20:05| 368| 20|
+-------------------+-----+-------+
The new minutes column shows only the minutes from each timestamp in the ts column.
Example 2: Extract Timestamp Truncated to Minutes
We can use the following syntax to return each timestamp from the ts column truncated to the minutes:
from pyspark.sql import functions as F
#create new column that contains timestamp truncated to the minutes
df_new = df.withColumn('minutes', F.date_trunc('minute', df['ts']))
#view new DataFrame
df_new.show()
+-------------------+-----+-------------------+
| ts|sales| minutes|
+-------------------+-----+-------------------+
|2023-01-15 04:14:22| 225|2023-01-15 04:14:00|
|2023-02-24 10:55:01| 260|2023-02-24 10:55:00|
|2023-07-14 18:34:59| 413|2023-07-14 18:34:00|
|2023-10-30 22:20:05| 368|2023-10-30 22:20:00|
+-------------------+-----+-------------------+
The new minutes column shows each timestamp from the ts column truncated to the minutes.
Additional Resources
The following tutorials explain how to perform other common tasks in PySpark:
How to Convert Timestamp to Date in PySpark
How to Convert String to Timestamp in PySpark
How to Calculate Difference Between Two Times in PySpark