The Complete Guide to Pandas dtypes


One of the most common Python libraries used for data analysis is pandas.

Within pandas, you can use the dtype function to check the “data type” of a particular object or column in a pandas DataFrame.

There are five main dtypes in pandas:

  • object: Text or mixed numeric values
  • bool: True or False values
  • int64: Integer values
  • float64: Floating point values
  • datetime64: Date and time values

It’s useful to know the dtypes of objects in pandas because it can affect how calculations are performed and it can help you understand why you may be encountering errors when performing certain operations.

In practice, you can check the data dtype of a single column in a pandas DataFrame or a single pandas Series by using the following syntax:

df['some_column'].dtype

This will return the dtype for the column that we specify.

Or, you can use the dtypes function to return the data type of every single column in a pandas DataFrame:

df.dtypes

This will return the dtype of each column, which is particularly useful so that we don’t have to write a for-loop or type out dtype multiple times to find the data type of each column.

The following example shows how to check the dtype of columns in a pandas DataFrame in practice.

Example: How to Use dtype and dtypes in Pandas

Suppose we create the following pandas DataFrame that contains information about various basketball players:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'B', 'B', 'B'],
                   'points': [18, 22, 19, 14, 14, 11],
                   'assists': [5, 7, 7, 9, 12, 9],
                   'minutes': [2.1, 4, 5.8, 9, 9.2, 3.5],
                   'all_star': [True, False, False, True, True, True]})

#view DataFrame
print(df)

  team  points  assists  minutes  all_star
0    A      18        5      2.1      True
1    A      22        7      4.0     False
2    A      19        7      5.8     False
3    B      14        9      9.0      True
4    B      14       12      9.2      True
5    B      11        9      3.5      True

We can see that the DataFrame has five total columns.

Suppose that we would like to display the data type of just the assists column.

We can use the following syntax to do so:

#display data type of 'assists' column
df['some_column'].dtype

dtype('int64')

This returns dtype(‘int64’) which tells us that the assists column is an integer column.

If we’d like, we can also use the following syntax to display the data type of just the assists and minutes columns:

#display data type of 'assists' and 'minutes' columns
df[['assists', 'minutes']].dtypes

assists      int64
minutes    float64
dtype: object

Note that when specifying multiple columns, we must use double brackets or else we will receive an error.

From the output we can see that the assists column is an integer data type while the minutes column is a floating point data type.

This should make sense considering the minutes column has decimal values to represent the fraction of minutes that particular athletes can play in a game.

Lastly, we can use the following syntax to display the data type of each column in the pandas DataFrame:

#display data type of each column in DataFrame
df.dtypes

team         object
points        int64
assists       int64
minutes     float64
all_star       bool
dtype: object

The output shows the data type of each column in the DataFrame.

Note: In practice, using df.dtypes is one of the most common commands you will use when analyzing real-world data since it allows you to gain an understanding of the underlying data types that you’re working with in a particular DataFrame.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

Pandas: How to Specify dtypes when Importing Excel File
Pandas: How to Specify dtypes when Importing CSV File
Pandas: How to Check dtype for All Columns in DataFrame

Featured Posts

Leave a Reply

Your email address will not be published. Required fields are marked *