You can use the following methods in PySpark to check the data type of columns in a DataFrame:
Method 1: Check Data Type of One Specific Column
#return data type of 'conference' column dict(df.dtypes)['conference']
Method 2: Check Data Type of All Columns
#return data type of all columns
df.dtypes
The following examples show how to use each method in practice with the following PySpark DataFrame:
from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate() #define data data = [['A', 'East', 11, 4], ['A', None, 8, 9], ['A', 'East', 10, 3], ['B', 'West', None, 12], ['B', 'West', None, 4], ['C', 'East', 5, 2]] #define column names columns = ['team', 'conference', 'points', 'assists'] #create dataframe using data and column names df = spark.createDataFrame(data, columns) #view dataframe df.show() +----+----------+------+-------+ |team|conference|points|assists| +----+----------+------+-------+ | A| East| 11| 4| | A| null| 8| 9| | A| East| 10| 3| | B| West| null| 12| | B| West| null| 4| | C| East| 5| 2| +----+----------+------+-------+
Example 1: Check Data Type of One Specific Column
We can use the following syntax to check the data type of the conference column in the DataFrame:
#return data type of 'conference' column dict(df.dtypes)['conference'] 'string'
The output tells us that the conference column has a data type of string.
To check the data type of another specific column, simply replace conference with a different column name:
#return data type of 'points' column dict(df.dtypes)['points'] 'bigint'
The output tells us that the points column has a data type of bigint.
Example 2: Check Data Type of All Columns
We can use the following syntax to check the data type of all columns in the DataFrame:
#return data type of all columns
df.dtypes
[('team', 'string'),
('conference', 'string'),
('points', 'bigint'),
('assists', 'bigint')]
The output shows each of the column names along with the data type of each column.
For example, we can see:
- The team column has a data type of string.
- The conference column has a data type of string.
- The points column has a data type of bigint.
- The assists column has a data type of bigint.
And so on.
Additional Resources
The following tutorials explain how to perform other common tasks in PySpark:
PySpark: How to Check if Column Exists in DataFrame
PySpark: How to Select Columns by Index in DataFrame
PySpark: How to Print One Column of a DataFrame