How to Use to_numeric in Pandas (With Examples)


Often you may want to convert one or more columns to numeric in a pandas DataFrame.

This can be useful when the data you are working with is known to be numeric but is being recognized as a string or some other data type.

The easiest way to convert one or more columns in a DataFrame to a number is to use the to_numeric() method, which uses the following syntax:

pandas.to_numeric(arg, errors=’raise’, downcast=None, …)

where:

  • arg: The argument to convert to numeric
  • errors: How to handle errors during conversion
  • downcast: Whether to cast result to smallest numerical type possible

The following example shows how to use the to_numeric() method in practice with a pandas DataFrame.

Example: How to Use to_numeric() Method in Pandas

Suppose we create the following pandas DataFrame that contains information about various basketball players:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['Mavs', 'Mavs', 'Heat', 'Heat', 'Kings'],
                   'points': ['25', '12', '15', '14', '19'],
                   'assists': ['5', '7', '7', '9', '12'],
                   'rebounds': [11, 8, 10, 6, 6]})

#view DataFrame
print(df)

    team  points  assists  rebounds
0   Mavs      25        5        11
1   Mavs      12        7         8
2   Heat      15        7        10
3   Heat      14        9         6
4  Kings      19       12         6

The DataFrame contains information about various basketball players including their team they play for, total points scored, total assists and total rebounds.

We can use the dtypes function to display the data type of each column in the DataFrame:

#view data type of each column in DataFrame
df.dtypes

team        object
points      object
assists     object
rebounds     int64
dtype: object

From the output we can see:

  • The team column is an object, i.e. a string.
  • The points column is an object, i.e. a string.
  • The assists column is an object, i.e. a string.
  • The rebounds column is an integer.

Suppose that we would like to convert the points column from an “object” data type to a numeric data type.

We can use the following syntax to do so:

#convert points column to numeric
df['points'] = pd.to_numeric(df['points'])

#display data type of each column in DataFrame
df.dtypes

team        object
points       int64
assists     object
rebounds     int64
dtype: object

From the output we can see that the points column now has a data type of int64, which is an integer.

Notice that all other columns in the DataFrame have retained their original data type.

By default, the to_numeric() method converts a column to an integer data type, but we can use the downcast argument to specify that we’d like a column to be converted to a floating point data type instead:

#convert points column to numeric
df['points'] = pd.to_numeric(df['points'], downcast='float')

#display data type of each column in DataFrame
df.dtypes

team         object
points      float32
assists      object
rebounds      int64
dtype: object

From the output we can see that the points column now has a data type of float32, which is a floating point number.

All other columns have retained their original data type.

Note: You can find the complete documentation for the to_numeric() method in pandas here.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

How to Select Only Numeric Columns in Pandas
How to Convert Categorical Variable to Numeric in Pandas
How to Extract Number from String in Pandas

Featured Posts

Leave a Reply

Your email address will not be published. Required fields are marked *