You can use the following syntax to sum the values across multiple columns in a PySpark DataFrame:

from pyspark.sql import functions as F #define columns to sum cols_to_sum = ['game1','game2','game3'] #create new DataFrame that contains sum of specific columns df_new = df.withColumn('sum', F.expr('+'.join(cols_to_sum)))

This particular example creates a new column called **sum **that contains the sum of values across the **game1**, **game2** and **game3** columns in the DataFrame.

The following example shows how to use this syntax in practice.

**Example: How to Sum Multiple Columns in PySpark**

Suppose we have the following PySpark DataFrame that contains information about points scored by various basketball players during three different games:

**from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
#define data
data = [['Mavs', 25, 11, 10],
['Nets', 22, 8, 14],
['Hawks', 14, 22, 10],
['Kings', 30, 22, 35],
['Bulls', 15, 14, 12],
['Blazers', 10, 14, 18]]
#define column names
columns = ['team', 'game1', 'game2', 'game3']
#create dataframe using data and column names
df = spark.createDataFrame(data, columns)
#view dataframe
df.show()
+-------+-----+-----+-----+
| team|game1|game2|game3|
+-------+-----+-----+-----+
| Mavs| 25| 11| 10|
| Nets| 22| 8| 14|
| Hawks| 14| 22| 10|
| Kings| 30| 22| 35|
| Bulls| 15| 14| 12|
|Blazers| 10| 14| 18|
+-------+-----+-----+-----+**

Suppose we would like to add a new column call **sum** that contains the sum of points scored by each player across all three games.

We can use the following syntax to do so:

from pyspark.sql import functions as F #define columns to sum cols_to_sum = ['game1','game2','game3'] #create new DataFrame that contains sum of specific columns df_new = df.withColumn('sum', F.expr('+'.join(cols_to_sum))) #view new DataFrame df_new.show() +-------+-----+-----+-----+---+ | team|game1|game2|game3|sum| +-------+-----+-----+-----+---+ | Mavs| 25| 11| 10| 46| | Nets| 22| 8| 14| 44| | Hawks| 14| 22| 10| 46| | Kings| 30| 22| 35| 87| | Bulls| 15| 14| 12| 41| |Blazers| 10| 14| 18| 42| +-------+-----+-----+-----+---+

Notice that the new **sum **column contains the sum of values across the **game1**, **game2** and **game3** columns.

For example:

- The sum of points for the
**Mavs**player is 25 + 11 + 10 =**46** - The sum of points for the
**Nets**player is 22 + 8 + 14 =**44** - The sum of points for the
**Hawks**player is 14 + 22 + 10 =**46**

And so on.

Note that we used the **withColumn** function to return a new DataFrame with the **sum **column added and all other columns left the same.

You can find the complete documentation for the PySpark **withColumn** function here.

