You can use the following syntax to add a string to each value in a column of a PySpark DataFrame:
from pyspark.sql.functions import concat, col, lit #add the string 'team_name_' to each string in the team column df_new = df.withColumn('team', concat(lit('team_name_'), col('team')))
This particular example adds the string ‘team_name_’ to each string in the team column of the DataFrame.
The following example shows how to use this syntax in practice.
Example: Add String to Each Value in Column in PySpark
Suppose we have the following PySpark DataFrame that contains information about various basketball players:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
#define data
data = [['A', 'East', 11],
['A', 'East', 8],
['A', 'East', 10],
['B', 'West', 6],
['B', 'West', 6],
['C', 'East', 5],
['C', 'East', 15],
['C', 'West', 31],
['D', 'West', 24]]
#define column names
columns = ['team', 'conference', 'points']
#create dataframe using data and column names
df = spark.createDataFrame(data, columns)
#view dataframe
df.show()
+----+----------+------+
|team|conference|points|
+----+----------+------+
| A| East| 11|
| A| East| 8|
| A| East| 10|
| B| West| 6|
| B| West| 6|
| C| East| 5|
| C| East| 15|
| C| West| 31|
| D| West| 24|
+----+----------+------+
Suppose we would like to add the string ‘team_name_’ to the beginning of each string in the team column.
We can use the following syntax to do so:
from pyspark.sql.functions import concat, col, lit #add the string 'team_name_' to each string in the team column df_new = df.withColumn('team', concat(lit('team_name_'), col('team'))) #view new DataFrame df_new.show() +-----------+----------+------+ | team|conference|points| +-----------+----------+------+ |team_name_A| East| 11| |team_name_A| East| 8| |team_name_A| East| 10| |team_name_B| West| 6| |team_name_B| West| 6| |team_name_C| East| 5| |team_name_C| East| 15| |team_name_C| West| 31| |team_name_D| West| 24| +-----------+----------+------+
Notice that the string ‘team_name_’ has been added to each existing string in the team column of the DataFrame.
Note: You can find the complete documentation for the PySpark concat function here.
Additional Resources
The following tutorials explain how to perform other common tasks in PySpark:
PySpark: How to Concatenate Columns
PySpark: How to Check if Column Contains String
PySpark: How to Replace String in Column
PySpark: How to Convert String to Integer