How to Select Only Numeric Columns in Pandas


You can use the following basic syntax to select only numeric columns in a pandas DataFrame:

import pandas as pd
import numpy as np

df.select_dtypes(include=np.number)

The following example shows how to use this function in practice.

Example: Select Only Numeric Columns in Pandas

Suppose we have the following pandas DataFrame that contains information about various basketball players:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'],
                   'points': [18, 22, 19, 14, 14, 11, 20, 28],
                   'assists': [5, 7, 7, 9, 12, 9, 9, 4],
                   'rebounds': [11, 8, 10, 6, 6, 5, 9, 12]})

#view DataFrame
print(df)

  team  points  assists  rebounds
0    A      18        5        11
1    B      22        7         8
2    C      19        7        10
3    D      14        9         6
4    E      14       12         6
5    F      11        9         5
6    G      20        9         9
7    H      28        4        12

We can use the following syntax to select only the numeric columns in the DataFrame:

import numpy as np

#select only the numeric columns in the DataFrame
df.select_dtypes(include=np.number)

        points	assists	rebounds
0	18	5	11
1	22	7	8
2	19	7	10
3	14	9	6
4	14	12	6
5	11	9	5
6	20	9	9
7	28	4	12

Notice that only the three numeric columns have been selected – points, assists, and rebounds.

We can verify that these columns are numeric by using the dtypes() function to display the data type of each variable in the DataFrame:

#display data type of each variable in DataFrame
df.dtypes

team        object
points       int64
assists      int64
rebounds     int64
dtype: object

From the output we can see that team is an object (i.e. string) while points, assists, and rebounds are all numeric.

Note that we can also use the following code to get a list of the numeric columns in the DataFrame:

#display list of numeric variables in DataFrame
df.select_dtypes(include=np.number).columns.tolist()

['points', 'assists', 'rebounds']

This allows us to quickly see the names of the numeric variables in the DataFrame without seeing their actual values.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

How to Select Columns by Name in Pandas
How to Select Columns by Index in Pandas
How to Select Columns Containing a Specific String in Pandas

Leave a Reply

Your email address will not be published. Required fields are marked *