Saturday, June 28, 2025
News PouroverAI
Visit PourOver.AI
No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing
News PouroverAI
No Result
View All Result

Hyperparameter Tuning: GridSearchCV and RandomizedSearchCV, Explained

November 3, 2023
in Data Science & ML
Reading Time: 4 mins read
0 0
A A
0
Share on FacebookShare on Twitter



Image by Author

When training a machine learning model, there are parameters or model coefficients that need to be determined. These parameters influence the model’s performance and need to be optimized. Additionally, there are hyperparameters that are not learned by the model but are specified by the developer. These hyperparameters also affect the model’s performance and are tunable.

To find the best values for these hyperparameters, we use a process called hyperparameter optimization or hyperparameter tuning. The two common techniques for hyperparameter tuning are grid search and randomized search. In this guide, we will learn how these techniques work and how to implement them using scikit-learn.

Let’s start by training a simple Support Vector Machine (SVM) classifier on the wine dataset. First, we need to import the required modules and classes:

“`html
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
“`

The wine dataset is a built-in dataset in scikit-learn. We can load the features and target labels as follows:

“`html
wine = datasets.load_wine()
X = wine.data
y = wine.target
“`

Next, we split the dataset into training and testing sets. Here, we use a test size of 0.2, which means 80% of the data is used for training and 20% for testing:

“`html
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=24)
“`

Now, we can create a baseline SVM classifier, fit the model to the training dataset, and evaluate its performance on the test set:

“`html
baseline_svm = SVC()
baseline_svm.fit(X_train, y_train)
y_pred = baseline_svm.predict(X_test)
“`

Since this is a multi-classification problem, we can look at the model’s accuracy:

“`html
accuracy = accuracy_score(y_test, y_pred)
print(f”Baseline SVM Accuracy: {accuracy:.2f}”)
“`

The accuracy score of the model with the default hyperparameters is approximately 0.78. However, using a single train-test split may not provide an accurate assessment of the model’s performance. We need a better way to evaluate the model and find the best hyperparameters. This is where cross-validation comes in.

In hyperparameter tuning, we aim to find the best combination of hyperparameter values for our SVM classifier. The commonly tuned hyperparameters for the support vector classifier include: C, kernel, and gamma.

Cross-validation helps assess how well the model generalizes to unseen data and reduces the risk of overfitting. The k-fold cross-validation technique involves splitting the dataset into k equally sized folds. The model is trained k times, with each fold serving as the validation set once and the remaining folds as the training set. Cross-validation provides a cross-validation accuracy for each fold, which can be used to evaluate the model’s performance.

Grid search is a hyperparameter tuning technique that performs an exhaustive search over a specified hyperparameter space to find the combination of hyperparameters that yields the best model performance. To implement grid search in scikit-learn, we need to import the GridSearchCV class:

“`html
from sklearn.model_selection import GridSearchCV
“`

We define the hyperparameter search space as a parameter grid, where we specify each hyperparameter and its corresponding values to explore. For example:

“`html
param_grid = {
‘C’: [0.1, 1, 10],
‘kernel’: [‘linear’, ‘rbf’, ‘poly’],
‘gamma’: [0.1, 1, ‘scale’, ‘auto’]
}
“`

Grid search systematically explores every possible combination of hyperparameters from the parameter grid. We can then instantiate the GridSearchCV object to tune the hyperparameters of the baseline SVM:

“`html
grid_search = GridSearchCV(estimator=baseline_svm, param_grid=param_grid, cv=5)
grid_search.fit(X_train, y_train)
“`

After fitting the model with the grid of hyperparameters, we can evaluate the performance of the best model on the test data:

“`html
best_params = grid_search.best_params_
best_model = grid_search.best_estimator_

y_pred_best = best_model.predict(X_test)
accuracy_best = accuracy_score(y_test, y_pred_best)
print(f”Best SVM Accuracy: {accuracy_best:.2f}”)
print(f”Best Hyperparameters: {best_params}”)
“`

Using grid search, we find that the model achieves an accuracy score of 0.94 with the best hyperparameters: C=0.1, gamma=0.1, kernel=’poly’.

Grid search explores all specified combinations of hyperparameters, ensuring that we don’t miss the best hyperparameters within the defined search space. However, it can be computationally expensive for complex models or extensive hyperparameter searches.

Randomized search is another hyperparameter tuning technique that explores random combinations of hyperparameters within specified distributions or ranges. It is particularly useful when dealing with a large hyperparameter search space. To implement randomized search, we need to import the RandomizedSearchCV class:

“`html
from sklearn.model_selection import RandomizedSearchCV
“`

Instead of specifying a grid of values, we can define probability distributions or ranges for each hyperparameter. For example:

“`html
param_dist = {
‘C’: uniform(0.1, 10), # Uniform distribution between 0.1 and 10
‘kernel’: [‘linear’, ‘rbf’, ‘poly’],
‘gamma’: [‘scale’, ‘auto’] + list(np.logspace(-3, 3, 50))
}
“`

Randomized search randomly samples a fixed number of combinations of hyperparameters from these distributions. We can then instantiate the RandomizedSearchCV object to search for the best hyperparameters:

“`html
randomized_search = RandomizedSearchCV(estimator=baseline_svm, param_distributions=param_dist, n_iter=20, cv=5)
randomized_search.fit(X_train, y_train)
“`

After fitting the model with the random hyperparameter combinations, we can evaluate the performance of the best model on the test data:

“`html
best_params_rand = randomized_search.best_params_
best_model_rand = randomized_search.best_estimator_

y_pred_best_rand = best_model_rand.predict(X_test)
accuracy_best_rand = accuracy_score(y_test, y_pred_best_rand)
print(f”Best SVM Accuracy: {accuracy_best_rand:.2f}”)
print(f”Best Hyperparameters: {best_params_rand}”)
“`

Using randomized search, we find the same best accuracy score of 0.94, but with different optimal hyperparameters: C=9.66495227534876, gamma=6.25055192527397, kernel=’poly’.

Randomized search allows us to explore a diverse set of hyperparameter combinations efficiently, especially when dealing with a large search space.

In conclusion, grid search and randomized search are two techniques used for hyperparameter tuning in machine learning models. Grid search exhaustively searches through all combinations of hyperparameters, while randomized search randomly samples combinations from specified distributions or ranges. Both techniques can help find the best hyperparameters for a model, but randomized search is particularly useful for large search spaces.



Source link

Tags: ExplainedGridSearchCVHyperparameterRandomizedSearchCVTuning
Previous Post

HuggingFace Introduces TextEnvironments: An Orchestrator between a Machine Learning Model and A Set of Tools (Python Functions) that the Model can Call to Solve Specific Tasks

Next Post

Apache Kafka and Apache Flink: An open-source match made in heaven

Related Posts

AI Compared: Which Assistant Is the Best?
Data Science & ML

AI Compared: Which Assistant Is the Best?

June 10, 2024
5 Machine Learning Models Explained in 5 Minutes
Data Science & ML

5 Machine Learning Models Explained in 5 Minutes

June 7, 2024
Cohere Picks Enterprise AI Needs Over ‘Abstract Concepts Like AGI’
Data Science & ML

Cohere Picks Enterprise AI Needs Over ‘Abstract Concepts Like AGI’

June 7, 2024
How to Learn Data Analytics – Dataquest
Data Science & ML

How to Learn Data Analytics – Dataquest

June 6, 2024
Adobe Terms Of Service Update Privacy Concerns
Data Science & ML

Adobe Terms Of Service Update Privacy Concerns

June 6, 2024
Build RAG applications using Jina Embeddings v2 on Amazon SageMaker JumpStart
Data Science & ML

Build RAG applications using Jina Embeddings v2 on Amazon SageMaker JumpStart

June 6, 2024
Next Post
Apache Kafka and Apache Flink: An open-source match made in heaven

Apache Kafka and Apache Flink: An open-source match made in heaven

Bank of America warns customers about deposit delays – report

Bank of America warns customers about deposit delays - report

‘Now and Then,’ the Beatles’ Last Song, Is Here, Thanks to Peter Jackson’s AI

‘Now and Then,’ the Beatles’ Last Song, Is Here, Thanks to Peter Jackson’s AI

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Trending
  • Comments
  • Latest
23 Plagiarism Facts and Statistics to Analyze Latest Trends

23 Plagiarism Facts and Statistics to Analyze Latest Trends

June 4, 2024
How ‘Chain of Thought’ Makes Transformers Smarter

How ‘Chain of Thought’ Makes Transformers Smarter

May 13, 2024
Amazon’s Bedrock and Titan Generative AI Services Enter General Availability

Amazon’s Bedrock and Titan Generative AI Services Enter General Availability

October 2, 2023
Is C.AI Down? Here Is What To Do Now

Is C.AI Down? Here Is What To Do Now

January 10, 2024
The Importance of Choosing a Reliable Affiliate Network and Why Olavivo is Your Ideal Partner

The Importance of Choosing a Reliable Affiliate Network and Why Olavivo is Your Ideal Partner

October 30, 2023
How To Build A Quiz App With JavaScript for Beginners

How To Build A Quiz App With JavaScript for Beginners

February 22, 2024
Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

Can You Guess What Percentage Of Their Wealth The Rich Keep In Cash?

June 10, 2024
AI Compared: Which Assistant Is the Best?

AI Compared: Which Assistant Is the Best?

June 10, 2024
How insurance companies can use synthetic data to fight bias

How insurance companies can use synthetic data to fight bias

June 10, 2024
5 SLA metrics you should be monitoring

5 SLA metrics you should be monitoring

June 10, 2024
From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

From Low-Level to High-Level Tasks: Scaling Fine-Tuning with the ANDROIDCONTROL Dataset

June 10, 2024
UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

UGRO Capital: Targeting to hit milestone of Rs 20,000 cr loan book in 8-10 quarters: Shachindra Nath

June 10, 2024
Facebook Twitter LinkedIn Pinterest RSS
News PouroverAI

The latest news and updates about the AI Technology and Latest Tech Updates around the world... PouroverAI keeps you in the loop.

CATEGORIES

  • AI Technology
  • Automation
  • Blockchain
  • Business
  • Cloud & Programming
  • Data Science & ML
  • Digital Marketing
  • Front-Tech
  • Uncategorized

SITEMAP

  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us

Copyright © 2023 PouroverAI News.
PouroverAI News

No Result
View All Result
  • Home
  • AI Tech
  • Business
  • Blockchain
  • Data Science & ML
  • Cloud & Programming
  • Automation
  • Front-Tech
  • Marketing

Copyright © 2023 PouroverAI News.
PouroverAI News

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In