You can use the following methods to find the day of the week for dates in a PySpark DataFrame:
Method 1: Get Day of Week as Number (Sunday =1)
import pyspark.sql.functions as F
df_new = df.withColumn('day_of_week', F.dayofweek('date'))
Method 2: Get Day of Week as Number (Monday=1)
import pyspark.sql.functions as F
df_new = df.withColumn('day_of_week', ((F.dayofweek('date')+5)%7)+1)
Method 3: Get Day of Week as Abbreviated Name (e.g. Mon)
import pyspark.sql.functions as F
df_new = df.withColumn('day_of_week', F.date_format('date', 'E'))
Method 4: Get Day of Week as Abbreviated Name (e.g. Monday)
import pyspark.sql.functions as F
df_new = df.withColumn('day_of_week', F.date_format('date', 'EEEE'))
The following examples show how to use each method in practice with the following PySpark DataFrame:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
#define data
data = [['2023-04-11', 22],
['2023-04-15', 14],
['2023-04-17', 12],
['2023-05-21', 15],
['2023-05-23', 30],
['2023-10-26', 45],
['2023-10-28', 32],
['2023-10-29', 47]]
#define column names
columns = ['date', 'sales']
#create dataframe using data and column names
df = spark.createDataFrame(data, columns)
#view dataframe
df.show()
+----------+-----+
| date|sales|
+----------+-----+
|2023-04-11| 22|
|2023-04-15| 14|
|2023-04-17| 12|
|2023-05-21| 15|
|2023-05-23| 30|
|2023-10-26| 45|
|2023-10-28| 32|
|2023-10-29| 47|
+----------+-----+
Example 1: Get Day of Week as Number (Sunday =1)
We can use the following syntax to get the day of the week as a number between 1 and 7, assuming Sunday is the start of the week:
import pyspark.sql.functions as F
#add new column that displays day of week as number
df_new = df.withColumn('day_of_week', F.dayofweek('date'))
#view new DataFrame
df_new.show()
+----------+-----+-----------+
| date|sales|day_of_week|
+----------+-----+-----------+
|2023-04-11| 22| 3|
|2023-04-15| 14| 7|
|2023-04-17| 12| 2|
|2023-05-21| 15| 1|
|2023-05-23| 30| 3|
|2023-10-26| 45| 5|
|2023-10-28| 32| 7|
|2023-10-29| 47| 1|
+----------+-----+-----------+
Example 2: Get Day of Week as Number (Monday=1)
We can use the following syntax to get the day of the week as a number between 1 and 7, assuming Monday is the start of the week:
import pyspark.sql.functions as F
#add new column that displays day of week as number
df_new = df.withColumn('day_of_week', ((F.dayofweek('date')+5)%7)+1)
#view new DataFrame
df_new.show()
+----------+-----+-----------+
| date|sales|day_of_week|
+----------+-----+-----------+
|2023-04-11| 22| 2|
|2023-04-15| 14| 6|
|2023-04-17| 12| 1|
|2023-05-21| 15| 7|
|2023-05-23| 30| 2|
|2023-10-26| 45| 4|
|2023-10-28| 32| 6|
|2023-10-29| 47| 7|
+----------+-----+-----------+
Example 3: Get Day of Week as Abbreviated Name
We can use the following syntax to get the day of the week as an abbreviated name:
import pyspark.sql.functions as F
#add new column that displays day of week as abbreviated name
df_new = df.withColumn('day_of_week', F.date_format('date', 'E'))
#view new DataFrame
df_new.show()
+----------+-----+-----------+
| date|sales|day_of_week|
+----------+-----+-----------+
|2023-04-11| 22| Tue|
|2023-04-15| 14| Sat|
|2023-04-17| 12| Mon|
|2023-05-21| 15| Sun|
|2023-05-23| 30| Tue|
|2023-10-26| 45| Thu|
|2023-10-28| 32| Sat|
|2023-10-29| 47| Sun|
+----------+-----+-----------+
Example 4: Get Day of Week as Full Name
We can use the following syntax to get the day of the week as a full name:
import pyspark.sql.functions as F
#add new column that displays day of week as full name
df_new = df.withColumn('day_of_week', F.date_format('date', 'EEEE'))
#view new DataFrame
df_new.show()
+----------+-----+-----------+
| date|sales|day_of_week|
+----------+-----+-----------+
|2023-04-11| 22| Tuesday|
|2023-04-15| 14| Saturday|
|2023-04-17| 12| Monday|
|2023-05-21| 15| Sunday|
|2023-05-23| 30| Tuesday|
|2023-10-26| 45| Thursday|
|2023-10-28| 32| Saturday|
|2023-10-29| 47| Sunday|
+----------+-----+-----------+
Additional Resources
The following tutorials explain how to perform other common tasks in PySpark:
How to Add Days to a Date Column in PySpark
How to Convert String to Date in PySpark
How to Convert Timestamp to Date in PySpark