You can use the following methods to reorder columns in a PySpark DataFrame:
Method 1: Reorder Columns in Specific Order
df = df.select('col3', 'col2', 'col4', 'col1')
Method 2: Reorder Columns Alphabetically
df = df.select(sorted(df.columns))
The following examples show how to use each method with the following PySpark DataFrame:
from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate() #define data data = [['A', 'East', 11, 4], ['A', 'East', 8, 9], ['A', 'East', 10, 3], ['B', 'West', 6, 12], ['B', 'West', 6, 4], ['C', 'East', 5, 2]] #define column names columns = ['team', 'conference', 'points', 'assists'] #create dataframe using data and column names df = spark.createDataFrame(data, columns) #view dataframe df.show() +----+----------+------+-------+ |team|conference|points|assists| +----+----------+------+-------+ | A| East| 11| 4| | A| East| 8| 9| | A| East| 10| 3| | B| West| 6| 12| | B| West| 6| 4| | C| East| 5| 2| +----+----------+------+-------+
Example 1: Reorder Columns in Specific Order
We can use the following syntax to reorder the columns in the DataFrame based on a specific order:
#reorder columns by specific order
df = df.select('conference', 'team', 'assists', 'points')
#view updated DataFrame
df.show()
+----------+----+-------+------+
|conference|team|assists|points|
+----------+----+-------+------+
| East| A| 4| 11|
| East| A| 9| 8|
| East| A| 3| 10|
| West| B| 12| 6|
| West| B| 4| 6|
| East| C| 2| 5|
+----------+----+-------+------+
The columns now appear in the exact order that we specified.
Example 2: Reorder Columns Alphabetically
We can use the following syntax to reorder the columns in the DataFrame alphabetically:
#reorder columns alphabetically
df = df.select(sorted(df.columns))
#view updated DataFrame
df.show()
+-------+----------+------+----+
|assists|conference|points|team|
+-------+----------+------+----+
| 4| East| 11| A|
| 9| East| 8| A|
| 3| East| 10| A|
| 12| West| 6| B|
| 4| West| 6| B|
| 2| East| 5| C|
+-------+----------+------+----+
The columns now appear in alphabetical order.
Additional Resources
The following tutorials explain how to perform other common tasks in PySpark:
PySpark: How to Select Rows Based on Column Values
PySpark: How to Select Columns by Index in DataFrame
PySpark: How to Select Rows by Index in DataFrame