You can use the following methods to count null values in a PySpark DataFrame:

**Method 1: Count Null Values in One Column**

#count number of null values in 'points' column df.where(df.points.isNull()).count()

**Method 2: Count Null Values in Each Column**

from pyspark.sql.functions import when, count, col #count number of null values in each column of DataFrame df.select([count(when(col(c).isNull(), c)).alias(c) for c in df.columns]).show()

The following examples show how to use each method in practice with the following PySpark DataFrame that contains information about various basketball players:

**from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
#define data
data = [['A', None, 11],
['A', 4, 8],
['A', 2, 22],
['A', 10, None],
['B', 8, None],
['B', 11, 14],
['B', 14, 13],
['B', 6, 7],
['C', 2, 8],
['C', 2, 5]]
#define column names
columns = ['team', 'assists', 'points']
#create dataframe using data and column names
df = spark.createDataFrame(data, columns)
#view dataframe
df.show()
+----+-------+------+
|team|assists|points|
+----+-------+------+
| A| null| 11|
| A| 4| 8|
| A| 2| 22|
| A| 10| null|
| B| 8| null|
| B| 11| 14|
| B| 14| 13|
| B| 6| 7|
| C| 2| 8|
| C| 2| 5|
+----+-------+------+
**

**Example 1: Count Null Values in One Column**

We can use the following syntax to count the number of null values in just the **points** column of the DataFrame:

#count number of null values in 'points' column df.where(df.points.isNull()).count() 2

From the output we can see there are **2** null values in the **points** column of the DataFrame.

Note that if we wanted to view these rows with null values in the **points** column then we could replace **count()** with **show()** as follows:

#display rows with null values in 'points' column df.where(df.points.isNull()).show() +----+-------+------+ |team|assists|points| +----+-------+------+ | A| 10| null| | B| 8| null| +----+-------+------+

The resulting DataFrame contains only the two rows with null values in the **points** column.

**Example 2: ****Count Null Values in Each Column**

We can use the following syntax to count the number of null values in each column of the DataFrame:

from pyspark.sql.functions import when, count, col #count number of null values in each column of DataFrame df.select([count(when(col(c).isNull(), c)).alias(c) for c in df.columns]).show() +----+-------+------+ |team|assists|points| +----+-------+------+ | 0| 1| 2| +----+-------+------+

From the output we can see:

- There are
**0**null values in the**team**column. - There is
**1**null value in the**assists**column. - There are
**2**null values in the**points**column.

**Additional Resources**

The following tutorials explain how to perform other common tasks in PySpark:

PySpark: How to Use “OR” Operator

PySpark: How to Use “AND” Operator

PySpark: How to Use “NOT IN” Operator