You can use the following syntax to create a pivot table from a PySpark DataFrame:

df.groupBy('team').pivot('position').sum('points').show()

This particular example creates a pivot table using the **team** column as the rows, the **position** column as the columns in the pivot table and the sum of the **points** column as the values within the pivot table.

The following example shows how to use this syntax in practice.

**Example: How to Create Pivot Table in PySpark**

Suppose we have the following PySpark DataFrame that contains information about the points scored by various basketball players:

**from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
#define data
data = [['A', 'G', 4],
['A', 'G', 4],
['A', 'F', 6],
['A', 'F', 8],
['B', 'G', 9],
['B', 'F', 5],
['B', 'F', 5],
['B', 'F', 12]]
#define column names
columns = ['team', 'position', 'points']
#create dataframe using data and column names
df = spark.createDataFrame(data, columns)
#view dataframe
df.show()
+----+--------+------+
|team|position|points|
+----+--------+------+
| A| G| 4|
| A| G| 4|
| A| F| 6|
| A| F| 8|
| B| G| 9|
| B| F| 5|
| B| F| 5|
| B| F| 12|
+----+--------+------+**

We can use the following syntax to create a pivot table using **team** as the rows, **position** as the columns and the sum of **points** as the values within the pivot table:

#create pivot table that shows sum of points by team and position df.groupBy('team').pivot('position').sum('points').show() +----+---+---+ |team| F| G| +----+---+---+ | B| 22| 9| | A| 14| 8| +----+---+---+

The resulting pivot table shows the sum of the points values for each team and position.

For example, we can see:

- Players on team B in position F scored a total of
**22**points. - Players on team B in position G scored a total of
**9**points. - Players on team A in position F scored a total of
**14**points. - Players on team A in position G scored a total of
**8**points.

Note that we could also use a different metric to summarize the points values if we’d like.

For example, we could use **mean **instead of **sum**:

#create pivot table that shows mean of points by team and position df.groupBy('team').pivot('position').mean('points').show() +----+-----------------+---+ |team| F| G| +----+-----------------+---+ | B|7.333333333333333|9.0| | A| 7.0|4.0| +----+-----------------+---+

The resulting pivot table shows the mean of the points values for each team and position.

For example, we can see:

- Players on team B in position F scored a mean of
**7.33**points. - Players on team B in position G scored a mean of
**9**points. - Players on team A in position F scored a mean of
**7**points. - Players on team A in position G scored a mean of
**4**points.

Feel free to use whichever summary metric you would like when creating your own pivot table.

**Additional Resources**

The following tutorials explain how to perform other common tasks in PySpark:

How to Sum Multiple Columns in PySpark DataFrame

How to Add Multiple Columns to PySpark DataFrame

How to Add New Rows to PySpark DataFrame