PySpark: How to Check if DataFrame is Empty


You can use the following syntax to check if a PySpark DataFrame is empty:

print(df.count() == 0)

This will return True if the DataFrame is empty or False if the DataFrame is not empty.

Note that df.count() will count the number of rows in the DataFrame, so we’re effectively checking if the total rows is equal to zero or not.

The following examples show how to use this syntax in practice.

Example 1: Check if Empty DataFrame is Empty

Suppose we create the following empty PySpark DataFrame with specific column names:

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

from pyspark.sql.types import StructType, StructField, StringType, FloatType

#create empty RDD
empty_rdd=spark.sparkContext.emptyRDD()

#specify colum names and types
my_columns=[StructField('team', StringType(),True),
            StructField('position', StringType(),True),
            StructField('points', FloatType(),True)]

#create DataFrame with specific column names
df=spark.createDataFrame([], schema=StructType(my_columns))

#view DataFrame
df.show()

+----+--------+------+
|team|position|points|
+----+--------+------+
+----+--------+------+

We can use the following syntax to check if the DataFrame is empty:

#check if DataFrame is empty
print(df.count() == 0)

True

We receive a value of True, which indicates that the DataFrame is indeed empty.

Example 2: Check if Non-Empty DataFrame is Empty

Suppose we create the following PySpark DataFrame that contains information about various basketball players:

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

#define data
data = [['Mavs', 18], 
        ['Nets', 33], 
        ['Lakers', 12], 
        ['Mavs', 15], 
        ['Cavs', 19],
        ['Wizards', 24],]
  
#define column names
columns = ['team', 'points'] 
  
#create dataframe using data and column names
df = spark.createDataFrame(data, columns) 
  
#view dataframe
df.show()

+-------+------+
|   team|points|
+-------+------+
|   Mavs|    18|
|   Nets|    33|
| Lakers|    12|
|   Mavs|    15|
|   Cavs|    19|
|Wizards|    24|
+-------+------+

We can use the following syntax to check if the DataFrame is empty:

#check if DataFrame is empty
print(df.count() == 0)

False

We receive a value of False, which indicates that the DataFrame is not empty.

Additional Resources

The following tutorials explain how to perform other common tasks in PySpark:

PySpark: How to Create New DataFrame from Existing DataFrame
PySpark: How to Select Rows by Index in DataFrame
PySpark: How to Select Columns by Index in DataFrame

Featured Posts

Leave a Reply

Your email address will not be published. Required fields are marked *