How to Fix: Input contains NaN, infinity or a value too large for dtype(‘float64’)


One common error you may encounter when using Python is:

ValueError: Input contains infinity or a value too large for dtype('float64').

This error usually occurs when you attempt to use some function from the scikit-learn module, but the DataFrame or matrix you’re using as input has NaN values or infinite values.

The following example shows how to resolve this error in practice.

How to Reproduce the Error

Suppose we have the following pandas DataFrame:

import pandas as pd
import numpy as np

#create DataFrame
df = pd.DataFrame({'x1': [1, 2, 2, 4, 2, 1, 5, 4, 2, 4, 4],
                   'x2': [1, 3, 3, 5, 2, 2, 1, np.inf, 0, 3, 4],
                   'y': [np.nan, 78, 85, 88, 72, 69, 94, 94, 88, 92, 90]})

#view DataFrame
print(df)

    x1   x2     y
0    1  1.0   NaN
1    2  3.0  78.0
2    2  3.0  85.0
3    4  5.0  88.0
4    2  2.0  72.0
5    1  2.0  69.0
6    5  1.0  94.0
7    4  inf  94.0
8    2  0.0  88.0
9    4  3.0  92.0
10   4  4.0  90.0

Now suppose we attempt to fit a multiple linear regression model using functions from scikit-learn:

from sklearn.linear_model import LinearRegression

#initiate linear regression model
model = LinearRegression()

#define predictor and response variables
X, y = df[['x1', 'x2']], df.y

#fit regression model
model.fit(X, y)

#print model intercept and coefficients
print(model.intercept_, model.coef_)

ValueError: Input contains infinity or a value too large for dtype('float64').

We receive an error since the DataFrame we’re using has both infinite and NaN values.

How to Fix the Error

The way to resolve this error is to first remove any rows from the DataFrame that contain infinite or NaN values:

#remove rows with any values that are not finite
df_new = df[np.isfinite(df).all(1)]

#view updated DataFrame
print(df_new)

    x1   x2     y
1    2  3.0  78.0
2    2  3.0  85.0
3    4  5.0  88.0
4    2  2.0  72.0
5    1  2.0  69.0
6    5  1.0  94.0
8    2  0.0  88.0
9    4  3.0  92.0
10   4  4.0  90.0

The two rows that had infinite or NaN values have been removed.

We can now proceed to fit our linear regression model:

from sklearn.linear_model import LinearRegression

#initiate linear regression model
model = LinearRegression()

#define predictor and response variables
X, y = df_new[['x1', 'x2']], df_new.y

#fit regression model
model.fit(X, y)

#print model intercept and coefficients
print(model.intercept_, model.coef_)

69.85144124168515 [ 5.72727273 -0.93791574]

Notice that we don’t receive any error this time because we first removed the rows with infinite or NaN values from the DataFrame.

Additional Resources

The following tutorials explain how to fix other common errors in Python:

How to Fix in Python: ‘numpy.ndarray’ object is not callable
How to Fix: TypeError: ‘numpy.float64’ object is not callable
How to Fix: Typeerror: expected string or bytes-like object

Leave a Reply

Your email address will not be published.