Image by Author
When training a machine learning model, there are parameters or model coefficients that need to be determined. These parameters influence the model’s performance and need to be optimized. Additionally, there are hyperparameters that are not learned by the model but are specified by the developer. These hyperparameters also affect the model’s performance and are tunable.
To find the best values for these hyperparameters, we use a process called hyperparameter optimization or hyperparameter tuning. The two common techniques for hyperparameter tuning are grid search and randomized search. In this guide, we will learn how these techniques work and how to implement them using scikit-learn.
Let’s start by training a simple Support Vector Machine (SVM) classifier on the wine dataset. First, we need to import the required modules and classes:
“`html
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
“`
The wine dataset is a built-in dataset in scikit-learn. We can load the features and target labels as follows:
“`html
wine = datasets.load_wine()
X = wine.data
y = wine.target
“`
Next, we split the dataset into training and testing sets. Here, we use a test size of 0.2, which means 80% of the data is used for training and 20% for testing:
“`html
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=24)
“`
Now, we can create a baseline SVM classifier, fit the model to the training dataset, and evaluate its performance on the test set:
“`html
baseline_svm = SVC()
baseline_svm.fit(X_train, y_train)
y_pred = baseline_svm.predict(X_test)
“`
Since this is a multi-classification problem, we can look at the model’s accuracy:
“`html
accuracy = accuracy_score(y_test, y_pred)
print(f”Baseline SVM Accuracy: {accuracy:.2f}”)
“`
The accuracy score of the model with the default hyperparameters is approximately 0.78. However, using a single train-test split may not provide an accurate assessment of the model’s performance. We need a better way to evaluate the model and find the best hyperparameters. This is where cross-validation comes in.
In hyperparameter tuning, we aim to find the best combination of hyperparameter values for our SVM classifier. The commonly tuned hyperparameters for the support vector classifier include: C, kernel, and gamma.
Cross-validation helps assess how well the model generalizes to unseen data and reduces the risk of overfitting. The k-fold cross-validation technique involves splitting the dataset into k equally sized folds. The model is trained k times, with each fold serving as the validation set once and the remaining folds as the training set. Cross-validation provides a cross-validation accuracy for each fold, which can be used to evaluate the model’s performance.
Grid search is a hyperparameter tuning technique that performs an exhaustive search over a specified hyperparameter space to find the combination of hyperparameters that yields the best model performance. To implement grid search in scikit-learn, we need to import the GridSearchCV class:
“`html
from sklearn.model_selection import GridSearchCV
“`
We define the hyperparameter search space as a parameter grid, where we specify each hyperparameter and its corresponding values to explore. For example:
“`html
param_grid = {
‘C’: [0.1, 1, 10],
‘kernel’: [‘linear’, ‘rbf’, ‘poly’],
‘gamma’: [0.1, 1, ‘scale’, ‘auto’]
}
“`
Grid search systematically explores every possible combination of hyperparameters from the parameter grid. We can then instantiate the GridSearchCV object to tune the hyperparameters of the baseline SVM:
“`html
grid_search = GridSearchCV(estimator=baseline_svm, param_grid=param_grid, cv=5)
grid_search.fit(X_train, y_train)
“`
After fitting the model with the grid of hyperparameters, we can evaluate the performance of the best model on the test data:
“`html
best_params = grid_search.best_params_
best_model = grid_search.best_estimator_
y_pred_best = best_model.predict(X_test)
accuracy_best = accuracy_score(y_test, y_pred_best)
print(f”Best SVM Accuracy: {accuracy_best:.2f}”)
print(f”Best Hyperparameters: {best_params}”)
“`
Using grid search, we find that the model achieves an accuracy score of 0.94 with the best hyperparameters: C=0.1, gamma=0.1, kernel=’poly’.
Grid search explores all specified combinations of hyperparameters, ensuring that we don’t miss the best hyperparameters within the defined search space. However, it can be computationally expensive for complex models or extensive hyperparameter searches.
Randomized search is another hyperparameter tuning technique that explores random combinations of hyperparameters within specified distributions or ranges. It is particularly useful when dealing with a large hyperparameter search space. To implement randomized search, we need to import the RandomizedSearchCV class:
“`html
from sklearn.model_selection import RandomizedSearchCV
“`
Instead of specifying a grid of values, we can define probability distributions or ranges for each hyperparameter. For example:
“`html
param_dist = {
‘C’: uniform(0.1, 10), # Uniform distribution between 0.1 and 10
‘kernel’: [‘linear’, ‘rbf’, ‘poly’],
‘gamma’: [‘scale’, ‘auto’] + list(np.logspace(-3, 3, 50))
}
“`
Randomized search randomly samples a fixed number of combinations of hyperparameters from these distributions. We can then instantiate the RandomizedSearchCV object to search for the best hyperparameters:
“`html
randomized_search = RandomizedSearchCV(estimator=baseline_svm, param_distributions=param_dist, n_iter=20, cv=5)
randomized_search.fit(X_train, y_train)
“`
After fitting the model with the random hyperparameter combinations, we can evaluate the performance of the best model on the test data:
“`html
best_params_rand = randomized_search.best_params_
best_model_rand = randomized_search.best_estimator_
y_pred_best_rand = best_model_rand.predict(X_test)
accuracy_best_rand = accuracy_score(y_test, y_pred_best_rand)
print(f”Best SVM Accuracy: {accuracy_best_rand:.2f}”)
print(f”Best Hyperparameters: {best_params_rand}”)
“`
Using randomized search, we find the same best accuracy score of 0.94, but with different optimal hyperparameters: C=9.66495227534876, gamma=6.25055192527397, kernel=’poly’.
Randomized search allows us to explore a diverse set of hyperparameter combinations efficiently, especially when dealing with a large search space.
In conclusion, grid search and randomized search are two techniques used for hyperparameter tuning in machine learning models. Grid search exhaustively searches through all combinations of hyperparameters, while randomized search randomly samples combinations from specified distributions or ranges. Both techniques can help find the best hyperparameters for a model, but randomized search is particularly useful for large search spaces.
Source link