You can use the following methods to calculate the mode of a column in a PySpark DataFrame:

**Method 1: Calculate Mode for One Specific Column**

#calculate mode of 'conference' column df.groupby('conference').count().orderBy('count', ascending=False).first()[0]

**Method 2: Calculate Mode for All Columns**

#calculate mode of each column in the DataFrame [[i,df.groupby(i).count().orderBy('count', ascending=False)\ .first()[0]] for i in df.columns]

The following examples show how to use each method in practice with the following PySpark DataFrame:

**from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
#define data
****data = [['A', 'East', 11, 4],
['A', 'East', 8, 9],
['A', 'East', 10, 3],
['B', 'West', 6, 12],
['B', 'West', 6, 4],
['C', 'East', 5, 2]]
#define column names
columns = ['team', 'conference', 'points', 'assists']
#create dataframe using data and column names
df = spark.createDataFrame(data, columns)
#view dataframe
df.show()
+----+----------+------+-------+
|team|conference|points|assists|
+----+----------+------+-------+
| A| East| 11| 4|
| A| East| 8| 9|
| A| East| 10| 3|
| B| West| 6| 12|
| B| West| 6| 4|
| C| East| 5| 2|
+----+----------+------+-------+
**

**Example 1: Calculate Mode for One Specific Column**

We can use the following syntax to calculate the mode of the **conference** column of the DataFrame only:

#calculate mode of 'conference' column df.groupby('conference').count().orderBy('count', ascending=False).first()[0] 'East'

The mode of the **conference** column is **East**.

This represents the most frequently occurring value in the **conference** column.

**Example 2: Calculate Mode for All Columns**

We can use the following syntax to calculate the mode in each column of the DataFrame:

#calculate mode of each column in the DataFrame [[i,df.groupby(i).count().orderBy('count', ascending=False)\ .first()[0]] for i in df.columns] [['team', 'A'], ['conference', 'East'], ['points', 6], ['assists', 4]]

The output shows the mode for each column in the DataFrame.

For example, we can see:

- The mode of the
**team**column is ‘A’ - The mode of the
**conference**column is ‘East’ - The mode of the
**points**column is 6 - The mode of the
**assists**column is 4

**Note**: In both examples, we used the **groupby** and **count** functions to count the occurrences of each unique value in the column, then we simply extracted the value with the highest frequency count to get the mode.

**Additional Resources**

The following tutorials explain how to perform other common tasks in PySpark:

How to Calculate the Median of a Column in PySpark

How to Calculate the Mean of a Column in PySpark

How to Calculate the Max Value of a Column in PySpark