The easiest way to select all columns except specific ones in a PySpark DataFrame is by using the drop function.
Here are two common ways to do so:
Method 1: Select All Columns Except One
#select all columns except 'conference' column df.drop('conference').show()
Method 2: Select All Columns Except Several Specific Ones
#select all columns except 'conference' and 'assists' columns df.drop('conference', 'assists').show()
The following examples show how to use each method in practice with the following PySpark DataFrame:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
#define data
data = [['A', 'East', 11, 4],
['A', 'East', 8, 9],
['A', 'East', 10, 3],
['B', 'West', 6, 12],
['B', 'West', 6, 4],
['C', 'East', 5, 2]]
#define column names
columns = ['team', 'conference', 'points', 'assists']
#create dataframe using data and column names
df = spark.createDataFrame(data, columns)
#view dataframe
df.show()
+----+----------+------+-------+
|team|conference|points|assists|
+----+----------+------+-------+
| A| East| 11| 4|
| A| East| 8| 9|
| A| East| 10| 3|
| B| West| 6| 12|
| B| West| 6| 4|
| C| East| 5| 2|
+----+----------+------+-------+
Example 1: Select All Columns Except One
We can use the following syntax to select all columns in the DataFrame except for the conference column:
#select all columns except 'conference' column df.drop('conference').show() +----+------+-------+ |team|points|assists| +----+------+-------+ | A| 11| 4| | A| 8| 9| | A| 10| 3| | B| 6| 12| | B| 6| 4| | C| 5| 2| +----+------+-------+
Notice that the resulting DataFrame contains all columns from the original DataFrame except for the conference column.
Example 2: Select All Columns Except Several Specific Ones
We can use the following syntax to select all columns in the DataFrame except for the conference and assists columns:
#select all columns except 'conference' and 'assists' column df.drop('conference', 'assists').show() +----+------+ |team|points| +----+------+ | A| 11| | A| 8| | A| 10| | B| 6| | B| 6| | C| 5| +----+------+
Notice that the resulting DataFrame contains all columns from the original DataFrame except for the conference and assists columns.
Additional Resources
The following tutorials explain how to perform other common tasks in PySpark:
PySpark: How to Select Columns with Alias
PySpark: How to Select Columns by Index
PySpark: How to Select Multiple Columns