You can use the following syntax to convert a column from a Boolean to an integer in PySpark:

from pyspark.sql.functions import when #convert Boolean column to integer column df_new = df.withColumn('int_column', when(df.bool_column==True, 1).otherwise(0))

This particular example converts the Boolean column named **bool_column** to an integer column named **int_column**.

Each of the values equal to **True** in the Boolean column will be shown as **1** in the integer column.

Similarly, each of the values equal to **False** in the Boolean column will be shown as **0** in the integer column.

The following example shows how to use this syntax in practice.

**Example: Convert Boolean Column to Integer in PySpark**

Suppose we have the following PySpark DataFrame that contains information about various basketball teams:

**from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
#define data
data = [['Mavs', 18, True],
['Nets', 33, True],
['Lakers', 12, False],
['Kings', 15, True],
['Hawks', 19, False],
['Wizards', 24, False],
['Magic', 28, True],
['Jazz', 40, False],
['Thunder', 24, False],
['Spurs', 13, True]]
#define column names
columns = ['team', 'points', 'playoffs']
#create dataframe using data and column names
df = spark.createDataFrame(data, columns)
#view dataframe
df.show()
+-------+------+--------+
| team|points|playoffs|
+-------+------+--------+
| Mavs| 18| true|
| Nets| 33| true|
| Lakers| 12| false|
| Kings| 15| true|
| Hawks| 19| false|
|Wizards| 24| false|
| Magic| 28| true|
| Jazz| 40| false|
|Thunder| 24| false|
| Spurs| 13| true|
+-------+------+--------+**

The **playoffs** column is a Boolean column that contains the values **true** and **false** to indicate whether or not each team made the playoffs.

We can use the following syntax to create a new column called **playoffs_int** that converts each of the Boolean values of **true** and **false** to the integer values of **1** or **0**:

from pyspark.sql.functions import when #convert Boolean column to integer column df_new = df.withColumn('playoffs_int', when(df.playoffs==True, 1).otherwise(0)) #view new DataFrame df_new.show() +-------+------+--------+------------+ | team|points|playoffs|playoffs_int| +-------+------+--------+------------+ | Mavs| 18| true| 1| | Nets| 33| true| 1| | Lakers| 12| false| 0| | Kings| 15| true| 1| | Hawks| 19| false| 0| |Wizards| 24| false| 0| | Magic| 28| true| 1| | Jazz| 40| false| 0| |Thunder| 24| false| 0| | Spurs| 13| true| 1| +-------+------+--------+------------+

The new **playoffs_int** column now displays all **true** and **false** values from the playoffs column as either **1** or **0**.

We can use the **dtypes** function to view the data type of each column in this new DataFrame and verify that the new column is indeed an integer column:

#display data type of each column df_new.dtypes [('team', 'string'), ('points', 'bigint'), ('playoffs', 'boolean'), ('playoffs_int', 'int')]

We can see that the new **playoffs_int** column is indeed an integer column.

