You can use the following methods to drop the first column from a PySpark DataFrame:
Method 1: Drop First Column by Index Position
#create new DataFrame that drops first column by index position df_new = df.drop(df.columns[0])
Method 2: Drop First Column by Name
#create new DataFrame that drops first column by name df_new = df.drop('col1')
The following examples show how to use each of these methods in practice with the following PySpark DataFrame:
from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate() #define data data = [['A', 'East', 11, 4], ['A', 'East', 8, 9], ['A', 'East', 10, 3], ['B', 'West', 6, 12], ['B', 'West', 6, 4], ['C', 'East', 5, 2]] #define column names columns = ['team', 'conference', 'points', 'assists'] #create dataframe using data and column names df = spark.createDataFrame(data, columns) #view dataframe df.show() +----+----------+------+-------+ |team|conference|points|assists| +----+----------+------+-------+ | A| East| 11| 4| | A| East| 8| 9| | A| East| 10| 3| | B| West| 6| 12| | B| West| 6| 4| | C| East| 5| 2| +----+----------+------+-------+
Example 1: Drop First Column in PySpark by Index Position
We can use the following syntax to drop the first column in the DataFrame by index position:
#create new DataFrame that drops first column by index position df_new = df.drop(df.columns[0]) #view new DataFrame df_new.show() +----------+------+-------+ |conference|points|assists| +----------+------+-------+ | East| 11| 4| | East| 8| 9| | East| 10| 3| | West| 6| 12| | West| 6| 4| | East| 5| 2| +----------+------+-------+
Notice that only the first column (the team column) has been dropped from the DataFrame.
Example 2: Drop First Column in PySpark by Name
We can use the following syntax to drop the first column in the DataFrame by name:
#create new DataFrame that drops first column by name df_new = df.drop('team') #view new DataFrame df_new.show() +----------+------+-------+ |conference|points|assists| +----------+------+-------+ | East| 11| 4| | East| 8| 9| | East| 10| 3| | West| 6| 12| | West| 6| 4| | East| 5| 2| +----------+------+-------+
Notice that only the first column (the team column) has been dropped from the DataFrame.
Additional Resources
The following tutorials explain how to perform other common tasks in PySpark:
PySpark: How to Select Rows Based on Column Values
PySpark: How to Select Rows by Index in DataFrame
PySpark: How to Find Unique Values in a Column