How to Reorder Columns in PySpark (With Examples)


You can use the following methods to reorder columns in a PySpark DataFrame:

Method 1: Reorder Columns in Specific Order

df = df.select('col3', 'col2', 'col4', 'col1')

Method 2: Reorder Columns Alphabetically

df = df.select(sorted(df.columns))

The following examples show how to use each method with the following PySpark DataFrame:

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

#define data
data = [['A', 'East', 11, 4], 
        ['A', 'East', 8, 9], 
        ['A', 'East', 10, 3], 
        ['B', 'West', 6, 12], 
        ['B', 'West', 6, 4], 
        ['C', 'East', 5, 2]] 
  
#define column names
columns = ['team', 'conference', 'points', 'assists'] 
  
#create dataframe using data and column names
df = spark.createDataFrame(data, columns) 
  
#view dataframe
df.show()

+----+----------+------+-------+
|team|conference|points|assists|
+----+----------+------+-------+
|   A|      East|    11|      4|
|   A|      East|     8|      9|
|   A|      East|    10|      3|
|   B|      West|     6|     12|
|   B|      West|     6|      4|
|   C|      East|     5|      2|
+----+----------+------+-------+

Example 1: Reorder Columns in Specific Order

We can use the following syntax to reorder the columns in the DataFrame based on a specific order:

#reorder columns by specific order
df = df.select('conference', 'team', 'assists', 'points')

#view updated DataFrame
df.show()

+----------+----+-------+------+
|conference|team|assists|points|
+----------+----+-------+------+
|      East|   A|      4|    11|
|      East|   A|      9|     8|
|      East|   A|      3|    10|
|      West|   B|     12|     6|
|      West|   B|      4|     6|
|      East|   C|      2|     5|
+----------+----+-------+------+

The columns now appear in the exact order that we specified.

Example 2: Reorder Columns Alphabetically

We can use the following syntax to reorder the columns in the DataFrame alphabetically:

#reorder columns alphabetically
df = df.select(sorted(df.columns)) 

#view updated DataFrame
df.show()

+-------+----------+------+----+
|assists|conference|points|team|
+-------+----------+------+----+
|      4|      East|    11|   A|
|      9|      East|     8|   A|
|      3|      East|    10|   A|
|     12|      West|     6|   B|
|      4|      West|     6|   B|
|      2|      East|     5|   C|
+-------+----------+------+----+

The columns now appear in alphabetical order.

Additional Resources

The following tutorials explain how to perform other common tasks in PySpark:

PySpark: How to Select Rows Based on Column Values
PySpark: How to Select Columns by Index in DataFrame
PySpark: How to Select Rows by Index in DataFrame

Featured Posts

Leave a Reply

Your email address will not be published. Required fields are marked *