You can use the following syntax to calculate the sum of values in each row of a PySpark DataFrame:

from pyspark.sql import functions as F #add new column that contains sum of each row df_new = df.withColumn('row_sum', sum([F.col(c) for c in df.columns]))

This particular example creates a new column named **row_sum** that contains the sum of values in each row.

The following example shows how to use this syntax in practice.

**Example: How to Calculate Sum of Each Row in PySpark**

Suppose we have the following PySpark DataFrame that shows the number of points scored in three different games by various basketball players:

**from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
#define data
data = [[14, 16, 10],
[12, 10, 13],
[8, 10, 20],
[15, 15, 15],
[19, 3, 15],
[24, 40, 23],
[15, 12, 19],
[10, 10, 16]]
#define column names
columns = ['game1', 'game2', 'game3']
#create dataframe using data and column names
df = spark.createDataFrame(data, columns)
#view dataframe
df.show()
+-----+-----+-----+
|game1|game2|game3|
+-----+-----+-----+
| 14| 16| 10|
| 12| 10| 13|
| 8| 10| 20|
| 15| 15| 15|
| 19| 3| 15|
| 24| 40| 23|
| 15| 12| 19|
| 10| 10| 16|
+-----+-----+-----+
**

We can use the following syntax to create a new column named **row_sum** that contains the sum of the values in each row:

from pyspark.sql import functions as F #add new column that contains sum of each row df_new = df.withColumn('row_sum', sum([F.col(c) for c in df.columns])) #view new DataFrame df_new.show() +-----+-----+-----+-------+ |game1|game2|game3|row_sum| +-----+-----+-----+-------+ | 14| 16| 10| 40| | 12| 10| 13| 35| | 8| 10| 20| 38| | 15| 15| 15| 45| | 19| 3| 15| 37| | 24| 40| 23| 87| | 15| 12| 19| 46| | 10| 10| 16| 36| +-----+-----+-----+-------+

The new column named **row_sum** contains the sum of the values in each row.

For example:

- The sum of values in the first row is 14 + 16 + 10 =
**40**. - The sum of values in the first row is 12 + 10 + 13 =
**35**. - The sum of values in the first row is 8 + 10 + 20 =
**38**.

And so on.

**Note**: If there are null values in the column, the **sum** function will ignore these values by default.

**Additional Resources**

The following tutorials explain how to perform other common tasks in PySpark:

How to Sum Multiple Columns in PySpark

How to Sum Column Based on a Condition in PySpark

How to Calculate Sum by Group in PySpark