How to Use GridSearchCV with Scikit-learn for Optimizing Model Parameters

How to Use GridSearchCV with Scikit-learn for Optimizing Model Parameters

Let’s learn to optimize the model parameters with Scikit-Learn GridSearchCV.

Preparation

First, let us install the Pandas and Scikit-Learn packages if you haven’t had any installed in your environment.

pip install -U pandas scikit-learn

Let’s import the Python packages used in this tutorial.

import pandas as pd
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

Next, we would create our sample data. For this tutorial, we will use the Iris dataset example.

# Load and splitting the dataset
iris = load_iris()
X, y = iris.data, iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

When the data is ready, we will use the GridSearchCV to improve our model.

Model Optimization with GridSearchCV

Model optimization is a process for improving the performance of a machine learning model by fine-tuning its hyperparameters. Hyperparameters refer to the configuration settings we can control during the learning process, which are different from the model parameters acquired during the training. By tweaking the model hyperparameters, we can improve our model performance.

GridSearchCV is a Scikit-learn function that automates the process of hyperparameter tuning. By performing an exhaustive search over a set of hyperparameters, the function evaluates each combination using cross-validation and returns the best hyperparameter combination according to the model performance target.

Let’s try to use the GridSearchCV to optimize the model. First, we would set the model.

# Define the model
model = RandomForestClassifier()

Then, we need to define the hyperparameters we want to evaluate. You need to understand the model hyperparameter before you can set it up. In this example, we would use the hyperparameters from the Random Forest algorithm.

# Define the set of Hyperparameter
param_grid = {
    'n_estimators': [50, 100, 150, 200, 300],
    'max_depth': [None, 5, 10, 15, 20, 30],
    'min_samples_split': [2, 5, 10, 15, 20]
}

Running the GridSearchCV with the set of Hyperparameter above could be achieved using the following code.

grid_search = GridSearchCV(estimator=model, param_grid=param_grid, scoring='accuracy', cv=5, n_jobs=-1)

# Running the GridSearchCV
grid_search.fit(X_train, y_train)

Lastly, the code below lets you acquire the best hyperparameters and scores.

print("Best Hyperparameters:", grid_search.best_params_)
print("Best Score:", grid_search.best_score_)

The output:

Best Hyperarameters: {'max_depth': None, 'min_samples_split': 2, 'n_estimators': 200}
Best Score: 0.9583333333333334

That’s all you need to perform hyperparameter optimization with GridSearchCV. You can tweak the Hyperparameter set and CV number to see if you can get better result. Try to master this method to improve your machine learning model.

Additional Resources

Leave a Reply

Your email address will not be published. Required fields are marked *