We will show you how to use Scikit-Learn's GridSearchCV and RandomizedSearchCV with a simple machine learning model to do grid search and random search for hyperparameter tuning.
Step 1: Setup and Import Necessary Libraries
First, we need to bring in the libraries we need.
# Import libraries
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
from scipy.stats import randintWe use numpy to do tasks with numbers.
The Iris dataset is loaded from sklearn.datasets.
We bring in train_test_split to divide the dataset into sets for training and testing.
For hyperparameter tuning, we bring in GridSearchCV and RandomizedSearchCV.
As our machine learning model, we bring in RandomForestClassifier.
We use scipy.stats to import randint, which lets us choose distributions for random search.
Step 2: Load and Prepare the Dataset
The Iris dataset will be loaded and split into training and testing sets.
# Load the dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)It loads the Iris dataset, and the names are given to y and the features to X.
We used an 80/20 split to divide the collection into training and testing sets.
Step 3: Define the Model and Hyperparameters
We describe the random forest model and give grid search and random search their hyperparameters.
# Define the model
model = RandomForestClassifier()
# Define the hyperparameters for grid search
param_grid = {
'n_estimators': [10, 50, 100],
'max_depth': [None, 10, 20, 30],
'min_samples_split': [2, 5, 10],
'min_samples_leaf': [1, 2, 4]
}
# Define the hyperparameters for random search
param_dist = {
'n_estimators': randint(10, 200),
'max_depth': [None, 10, 20, 30, 40, 50],
'min_samples_split': randint(2, 11),
'min_samples_leaf': randint(1, 5)
}The random forest classifier is what we call our model.
We give the grid search hyperparameters, which are n_estimators, max_depth, min_samples_split, and min_samples_leaf.
We use similar hyperparameters but with ranges and distributions to describe the hyperparameter distributions for random search.
Step 4: Perform Grid Search
We use grid search to find the best set of hyperparameters.
# Perform grid search
grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=5, n_jobs=-1, verbose=2)
grid_search.fit(X_train, y_train)
# Get the best parameters from grid search
print("Best parameters from grid search:", grid_search.best_params_)We set up a GridSearchCV instance with the model, the hyperparameter grid, 5-fold cross-validation, and parallel processing.
We used the training data to make the grid search work.
The grid search finds the best hyperparameters, which we print.
Step 5: Perform Random Search
We use random search to find the best set of hyperparameters.
# Perform random search
random_search = RandomizedSearchCV(estimator=model, param_distributions=param_dist, n_iter=100, cv=5, n_jobs=-1, verbose=2, random_state=42)
random_search.fit(X_train, y_train)
# Get the best parameters from random search
print("Best parameters from random search:", random_search.best_params_)We set up an instance of RandomizedSearchCV with the model, 100 rounds, 5-fold cross-validation, simultaneous processing, and hyperparameter distributions.
We used the training data to make the random search work.
We print out the hyperparameters that the random search says are the best.
Step 6: Evaluate the Model
On the test set, we test the model that has the best hyperparameters.
# Evaluate the model with best parameters from grid search
best_grid_model = grid_search.best_estimator_
grid_accuracy = best_grid_model.score(X_test, y_test)
print("Test accuracy with grid search:", grid_accuracy)
# Evaluate the model with best parameters from random search
best_random_model = random_search.best_estimator_
random_accuracy = best_random_model.score(X_test, y_test)
print("Test accuracy with random search:", random_accuracy)We use grid search to find the best model and then use the test set to see how accurate it is.
We use random search to find the best model and then use the test set to see how accurate it is.
We print out the results of the test for both types.
Sequence of prompts stored as linked records or documents.
It helps with filtering, categorization, and evaluating generated outputs.
As text fields, often with associated metadata and response outputs.
Combines keyword and vector-based search for improved result relevance.
Yes, for storing structured prompt-response pairs or evaluation data.
Combines database search with generation to improve accuracy and grounding.
Using encryption, anonymization, and role-based access control.
Using tools like DVC or MLflow with database or cloud storage.
Databases optimized to store and search high-dimensional embeddings efficiently.
They enable semantic search and similarity-based retrieval for better context.
They provide organized and labeled datasets for supervised trainining.
Track usage patterns, feedback, and model behavior over time.
Enhancing model responses by referencing external, trustworthy data sources.
They store training data and generated outputs for model development and evaluation.
Removing repeated data to reduce bias and improve model generalization.
Yes, using BLOB fields or linking to external model repositories.
With user IDs, timestamps, and quality scores in relational or NoSQL databases.
Using distributed databases, replication, and sharding.
NoSQL or vector databases like Pinecone, Weaviate, or Elasticsearch.
Pinecone, FAISS, Milvus, and Weaviate.
With indexing, metadata tagging, and structured formats for efficient access.
Text, images, audio, and structured data from diverse databases.
Yes, for representing relationships between entities in generated content.
Yes, using structured or document databases with timestamps and session data.
They store synthetic data alongside real data with clear metadata separation.
Copyrights © 2024 letsupdateskills All rights reserved