You can use the following syntax to calculate the max value across multiple columns in a PySpark DataFrame:

from pyspark.sql.functions import greatest #find max value across columns 'game1', 'game2', and 'game3' df_new = df.withColumn('max', greatest('game1', 'game2', 'game3'))

This particular example creates a new column called **max **that contains the max of values across the **game1**, **game2** and **game3** columns in the DataFrame.

The following example shows how to use this syntax in practice.

**Example: How to Calculate Max Value Across Columns in PySpark**

Suppose we have the following PySpark DataFrame that contains information about points scored by various basketball players during three different games:

**from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
#define data
data = [['Mavs', 25, 11, 10],
['Nets', 22, 8, 14],
['Hawks', 14, 22, 10],
['Kings', 30, 22, 35],
['Bulls', 15, 14, 12],
['Blazers', 10, 14, 18]]
#define column names
columns = ['team', 'game1', 'game2', 'game3']
#create dataframe using data and column names
df = spark.createDataFrame(data, columns)
#view dataframe
df.show()
+-------+-----+-----+-----+
| team|game1|game2|game3|
+-------+-----+-----+-----+
| Mavs| 25| 11| 10|
| Nets| 22| 8| 14|
| Hawks| 14| 22| 10|
| Kings| 30| 22| 35|
| Bulls| 15| 14| 12|
|Blazers| 10| 14| 18|
+-------+-----+-----+-----+**

Suppose we would like to add a new column call **max **that contains the max of points scored by each player across all three games.

We can use the following syntax to do so:

from pyspark.sql.functions import greatest #find max value across columns 'game1', 'game2', and 'game3' df_new = df.withColumn('max', greatest('game1', 'game2', 'game3')) #view new DataFrame df_new.show() +-------+-----+-----+-----+---+ | team|game1|game2|game3|max| +-------+-----+-----+-----+---+ | Mavs| 25| 11| 10| 25| | Nets| 22| 8| 14| 22| | Hawks| 14| 22| 10| 22| | Kings| 30| 22| 35| 35| | Bulls| 15| 14| 12| 15| |Blazers| 10| 14| 18| 18| +-------+-----+-----+-----+---+

Notice that the new **max **column contains the max of values across the **game1**, **game2** and **game3** columns.

For example:

- The max of points for the
**Mavs**player is**25** - The max of points for the
**Nets**player is**22** - The max of points for the
**Hawks**player is**22**

And so on.

Note that we used the **withColumn** function to return a new DataFrame with the **max **column added and all other columns left the same.

You can find the complete documentation for the PySpark **withColumn** function here.

**Additional Resources**

The following tutorials explain how to perform other common tasks in PySpark:

How to Calculate the Mean of a Column in PySpark

How to Calculate Mean of Multiple Columns in PySpark

How to Sum Multiple Columns in PySpark