You can use the following syntax to convert a column to uppercase in a PySpark DataFrame:
from pyspark.sql.functions import upper
df = df.withColumn('my_column', upper(df['my_column']))
The following example shows how to use this syntax in practice.
Example: How to Convert Column to Uppercase in PySpark
Suppose we create the following PySpark DataFrame that contains information about various basketball players:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
#define data
data = [['A', 'East', 11, 4],
['A', 'East', 8, 9],
['A', 'East', 10, 3],
['B', 'West', 6, 12],
['B', 'West', 6, 4],
['C', 'East', 5, 2]]
#define column names
columns = ['team', 'conference', 'points', 'assists']
#create dataframe using data and column names
df = spark.createDataFrame(data, columns)
#view dataframe
df.show()
+----+----------+------+-------+
|team|conference|points|assists|
+----+----------+------+-------+
| A| East| 11| 4|
| A| East| 8| 9|
| A| East| 10| 3|
| B| West| 6| 12|
| B| West| 6| 4|
| C| East| 5| 2|
+----+----------+------+-------+
Suppose we would like to convert all strings in the conference column to uppercase.
We can use the following syntax to do so:
from pyspark.sql.functions import upper
#convert 'conference' column to uppercase
df = df.withColumn('conference', upper(df['conference']))
#view updated DataFrame
df.show()
+----+----------+------+-------+
|team|conference|points|assists|
+----+----------+------+-------+
| A| EAST| 11| 4|
| A| EAST| 8| 9|
| A| EAST| 10| 3|
| B| WEST| 6| 12|
| B| WEST| 6| 4|
| C| EAST| 5| 2|
+----+----------+------+-------+
Notice that all strings in the conference column of the updated DataFrame are now uppercase.
Note #1: We used the withColumn function to return a new DataFrame with the conference column modified and all other columns left the same.
Note #2: You can find the complete documentation for the PySpark withColumn function here.
Additional Resources
The following tutorials explain how to perform other common tasks in PySpark:
PySpark: How to Select Multiple Columns
PySpark: How to Select Columns with Alias
PySpark: How to Select Columns by Index