You can use the following syntax to group rows by week in a PySpark DataFrame:

from pyspark.sql.functions import weekofyear, sum df.groupBy(weekofyear('date').alias('week')).agg(sum('sales').alias('sum_sales')).show()

This particular example groups the rows of the DataFrame by week based on the date in the **date** column and then calculates the sum of the values in the **sales** column by week.

The following example shows how to use this syntax in practice.

**Example: How to Group by Week in PySpark**

Suppose we have the following PySpark DataFrame that contains information about the sales made on various days at some company:

**from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
#define data
data = [['2023-04-11', 22],
['2023-04-15', 14],
['2023-04-17', 12],
['2023-05-21', 15],
['2023-05-23', 30],
['2023-10-26', 45],
['2023-10-28', 32],
['2023-10-29', 47]]
#define column names
columns = ['date', 'sales']
#create dataframe using data and column names
df = spark.createDataFrame(data, columns)
#view dataframe
df.show()
+----------+-----+
| date|sales|
+----------+-----+
|2023-04-11| 22|
|2023-04-15| 14|
|2023-04-17| 12|
|2023-05-21| 15|
|2023-05-23| 30|
|2023-10-26| 45|
|2023-10-28| 32|
|2023-10-29| 47|
+----------+-----+**

Suppose we would like to calculate the sum of the sales, grouped by week.

We can use the following syntax to do so:

from pyspark.sql.functions import weekofyear, sum #calculate sum of sales by week df.groupBy(weekofyear('date').alias('week')).agg(sum('sales').alias('sum_sales')).show() +----+---------+ |week|sum_sales| +----+---------+ | 15| 36| | 16| 12| | 20| 15| | 21| 30| | 43| 124| +----+---------+

The resulting DataFrame shows the sum of sales by week.

For example, we can see:

- The sum of sales for the 15th week of the year was
**36**. - The sum of sales for the 16th week of the year was
**12**. - The sum of sales for the 20th week of the year was
**15**.

And so on.

Note that you can also aggregate sales by a different metric if you’d like.

For example, you could use the following syntax to calculate the total count of sales, grouped by week:

from pyspark.sql.functions import weekofyear, count #calculate count of sales by week df.groupBy(weekofyear('date').alias('week')).agg(count('sales').alias('cnt_sales')).show() +----+---------+ |week|cnt_sales| +----+---------+ | 15| 2| | 16| 1| | 20| 1| | 21| 1| | 43| 3| +----+---------+

The resulting DataFrame now shows the count of sales by week.

**Additional Resources**

The following tutorials explain how to perform other common tasks in PySpark:

How to Add Days to a Date Column in PySpark

How to Convert String to Date in PySpark

How to Convert Timestamp to Date in PySpark