One common error you may encounter when using Python is:

**ValueError: Input contains infinity or a value too large for dtype('float64').
**

This error usually occurs when you attempt to use some function from the scikit-learn module, but the DataFrame or matrix you’re using as input has NaN values or infinite values.

The following example shows how to resolve this error in practice.

**How to Reproduce the Error**

Suppose we have the following pandas DataFrame:

import pandas as pd import numpy as np #create DataFrame df = pd.DataFrame({'x1': [1, 2, 2, 4, 2, 1, 5, 4, 2, 4, 4], 'x2': [1, 3, 3, 5, 2, 2, 1, np.inf, 0, 3, 4], 'y': [np.nan, 78, 85, 88, 72, 69, 94, 94, 88, 92, 90]}) #view DataFrame print(df) x1 x2 y 0 1 1.0 NaN 1 2 3.0 78.0 2 2 3.0 85.0 3 4 5.0 88.0 4 2 2.0 72.0 5 1 2.0 69.0 6 5 1.0 94.0 7 4 inf 94.0 8 2 0.0 88.0 9 4 3.0 92.0 10 4 4.0 90.0

Now suppose we attempt to fit a multiple linear regression model using functions from scikit-learn:

from sklearn.linear_model import LinearRegression #initiate linear regression model model = LinearRegression() #define predictor and response variables X, y = df[['x1', 'x2']], df.y #fit regression model model.fit(X, y) #print model intercept and coefficients print(model.intercept_, model.coef_) ValueError: Input contains infinity or a value too large for dtype('float64').

We receive an error since the DataFrame we’re using has both infinite and NaN values.

**How to Fix the Error**

The way to resolve this error is to first remove any rows from the DataFrame that contain infinite or NaN values:

#remove rows with any values that are not finite df_new = df[np.isfinite(df).all(1)] #view updated DataFrame print(df_new) x1 x2 y 1 2 3.0 78.0 2 2 3.0 85.0 3 4 5.0 88.0 4 2 2.0 72.0 5 1 2.0 69.0 6 5 1.0 94.0 8 2 0.0 88.0 9 4 3.0 92.0 10 4 4.0 90.0

The two rows that had infinite or NaN values have been removed.

We can now proceed to fit our linear regression model:

from sklearn.linear_model import LinearRegression #initiate linear regression model model = LinearRegression() #define predictor and response variables X, y = df_new[['x1', 'x2']], df_new.y #fit regression model model.fit(X, y) #print model intercept and coefficients print(model.intercept_, model.coef_) 69.85144124168515 [ 5.72727273 -0.93791574]

Notice that we don’t receive any error this time because we first removed the rows with infinite or NaN values from the DataFrame.

