You can use the following methods in PySpark to check if a particular column exists in a DataFrame:
Method 1: Check if Column Exists (Case-Sensitive)
'points' in df.columns
Method 2: Check if Column Exists (Not Case-Sensitive)
'points'.upper() in (name.upper() for name in df.columns)
The following examples show how to use each method in practice with the following PySpark DataFrame:
from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate() #define data data = [['A', 'East', 11, 4], ['A', None, 8, 9], ['A', 'East', 10, 3], ['B', 'West', None, 12], ['B', 'West', None, 4], ['C', 'East', 5, 2]] #define column names columns = ['team', 'conference', 'points', 'assists'] #create dataframe using data and column names df = spark.createDataFrame(data, columns) #view dataframe df.show() +----+----------+------+-------+ |team|conference|points|assists| +----+----------+------+-------+ | A| East| 11| 4| | A| null| 8| 9| | A| East| 10| 3| | B| West| null| 12| | B| West| null| 4| | C| East| 5| 2| +----+----------+------+-------+
Example 1: Check if Column Exists (Case-Sensitive)
We can use the following syntax to check if the column name points exists in the DataFrame:
#check if column name 'points' exists in the DataFrame 'points' in df.columns True
The output returns True since the column name points does indeed exist in the DataFrame.
Note that this syntax is case-sensitive so if we search instead for the column name Points then we will receive an output of False since the case we searched for doesn’t precisely match the case of the column name in the DataFrame:
#check if column name 'Points' exists in the DataFrame 'Points' in df.columns False
Example 2: Check if Column Exists (Not Case-Sensitive)
We can use the following syntax to check if the column name Points exists in the DataFrame:
#check if column name 'Points' exists in the DataFrame 'Points'.upper() in (name.upper() for name in df.columns) True
The output returns True even though the case of the column name that we searched for didn’t precisely match the column name of points in the DataFrame.
Note: In this example we used the upper() function to first convert our search phrase to all uppercase and convert all column names in the DataFrame to uppercase.
This allowed us to perform a case-insensitive search.
Additional Resources
The following tutorials explain how to perform other common tasks in PySpark:
PySpark: How to Select Columns by Index in DataFrame
PySpark: How to Select Rows Based on Column Values
PySpark: How to Print One Column of a DataFrame