You can use the following syntax to round the values in a column of a PySpark DataFrame to 2 decimal places:
from pyspark.sql.functions import round #create new column that rounds values in points column to 2 decimal places df_new = df.withColumn('points2', round(df.points, 2))
This particular example creates a new column named points2 that rounds each of the values in the points column of the DataFrame to 2 decimal places.
The following example shows how to use this syntax in practice.
Example: Round Column Values to 2 Decimal Places in PySpark
Suppose we have the following PySpark DataFrame that contains information about points scored by various basketball players:
from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate() #define data data = [['Mavs', 18.3494], ['Nets', 33.5541], ['Lakers', 12.6711], ['Kings', 15.6588], ['Hawks', 19.3215], ['Wizards', 24.0399], ['Magic', 28.6843], ['Jazz', 40.0001], ['Thunder', 24.2365], ['Spurs', 13.9446]] #define column names columns = ['team', 'points'] #create dataframe using data and column names df = spark.createDataFrame(data, columns) #view dataframe df.show() +-------+-------+ | team| points| +-------+-------+ | Mavs|18.3494| | Nets|33.5541| | Lakers|12.6711| | Kings|15.6588| | Hawks|19.3215| |Wizards|24.0399| | Magic|28.6843| | Jazz|40.0001| |Thunder|24.2365| | Spurs|13.9446| +-------+-------+
Suppose we would like to round each of the values in the points column to 2 decimal places.
We can use the following syntax to do so:
from pyspark.sql.functions import round #create new column that rounds values in points column to 2 decimal places df_new = df.withColumn('points2', round(df.points, 2)) #view new DataFrame df_new.show() +-------+-------+-------+ | team| points|points2| +-------+-------+-------+ | Mavs|18.3494| 18.35| | Nets|33.5541| 33.55| | Lakers|12.6711| 12.67| | Kings|15.6588| 15.66| | Hawks|19.3215| 19.32| |Wizards|24.0399| 24.04| | Magic|28.6843| 28.68| | Jazz|40.0001| 40.0| |Thunder|24.2365| 24.24| | Spurs|13.9446| 13.94| +-------+-------+-------+
Notice that the new column named points2 contains each of the values from the points column rounded to 2 decimal places.
For example:
- 18.3494 had been rounded to 18.35.
- 33.5541 has been rounded to 33.55.
- 12.6711 has been rounded to 12.67.
And so on.
Note: You can find the complete documentation for the PySpark round function here.
Additional Resources
The following tutorials explain how to perform other common tasks in PySpark:
PySpark: How to Select Columns by Index in DataFrame
PySpark: How to Check Data Type of Columns in DataFrame
PySpark: How to Print One Column of a DataFrame