You can use the following syntax to convert an integer column to a string column in a PySpark DataFrame:
from pyspark.sql.types import StringType df = df.withColumn('my_string', df['my_integer'].cast(StringType()))
This particular example creates a new column called my_string that contains the string values from the integer values in the my_integer column.
The following example shows how to use this syntax in practice.
Example: How to Convert Integer to String in PySpark
Suppose we have the following PySpark DataFrame that contains information about points scored by various basketball players:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
#define data
data = [['A', 11],
['B', 19],
['C', 22],
['D', 25],
['E', 12],
['F', 41],
['G', 32],
['H', 20]]
#define column names
columns = ['team', 'points']
#create dataframe using data and column names
df = spark.createDataFrame(data, columns)
#view dataframe
df.show()
+----+------+
|team|points|
+----+------+
| A| 11|
| B| 19|
| C| 22|
| D| 25|
| E| 12|
| F| 41|
| G| 32|
| H| 20|
+----+------+
We can use the following syntax to display the data type of each column in the DataFrame:
#check data type of each column
df.dtypes
[('team', 'string'), ('points', 'bigint')]
We can see that the points column currently has a data type of integer.
To convert this column from an integer to a string, we can use the following syntax:
from pyspark.sql.types import StringType
#create string column from integer column
df = df.withColumn('points_string', df['points'].cast(StringType()))
#view updated DataFrame
df.show()
+----+------+-------------+
|team|points|points_string|
+----+------+-------------+
| A| 11| 11|
| B| 19| 19|
| C| 22| 22|
| D| 25| 25|
| E| 12| 12|
| F| 41| 41|
| G| 32| 32|
| H| 20| 20|
+----+------+-------------+
We can use the dtypes function once again to view the data types of each column in the DataFrame:
#check data type of each column
df.dtypes
[('team', 'string'), ('points', 'bigint'), ('points_string', 'string')]
We can see that the points_string column has a data type of string.
We have successfully created a string column from an integer column.
Additional Resources
The following tutorials explain how to perform other common tasks in PySpark:
How to Convert String to Integer in PySpark
How to Convert String to Date in PySpark
How to Convert String to Timestamp in PySpark