XGBoost algorithm
GridSearchCV
Python3
import xgboost as xgb from sklearn.model_selection import GridSearchCV, train_test_split from sklearn import datasets # Load the Iris dataset iris = datasets.load_iris() X = iris.data y = iris.target # Split the dataset into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2 , random_state = 42 ) # Define the hyperparameters and their search ranges param_grid = { 'n_estimators' : [ 100 , 200 , 300 ], 'learning_rate' : [ 0.01 , 0.1 , 0.2 ], 'max_depth' : [ 3 , 4 , 5 ], 'min_child_weight' : [ 1 , 3 , 5 ], 'subsample' : [ 0.8 , 0.9 , 1.0 ], 'colsample_bytree' : [ 0.8 , 0.9 , 1.0 ] } # Create an XGBoost model xgb_model = xgb.XGBClassifier() # Perform GridSearchCV grid_search = GridSearchCV(xgb_model, param_grid, cv = 5 , scoring = 'accuracy' ) grid_search.fit(X_train, y_train) # Get the best hyperparameters best_params = grid_search.best_params_ # Fit the model with the best hyperparameters on the entire dataset best_model = grid_search.best_estimator_ best_model.fit(X_train, y_train) # Evaluate the best model on the test set accuracy = best_model.score(X_test, y_test) print (f "Best Hyperparameters: {best_params}" ) print (f "Accuracy on test set: {accuracy:.2f}" ) |
Output:
Best Hyperparameters: {'colsample_bytree': 1.0, 'learning_rate': 0.01, 'max_depth': 3, 'min_child_weight': 1, 'n_estimators': 200, 'subsample': 1.0}
Accuracy on test set: 1.00
In this output:
- The best hyperparameters found by the grid search are listed.
- The accuracy on the test set is also reported, indicating how well the best model performs on unseen data.
- The goal of this code is to find the best hyperparameters for an XGBoost classifier and evaluate its performance on the test set
Random search
Python3
import xgboost as xgb from sklearn.model_selection import RandomizedSearchCV, train_test_split from sklearn import datasets # Load the Iris dataset iris = datasets.load_iris() X = iris.data y = iris.target # Split the dataset into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2 , random_state = 42 ) # Define the hyperparameter search space param_dist = { 'n_estimators' : [ 100 , 200 , 300 , 400 , 500 ], 'learning_rate' : [ 0.01 , 0.1 , 0.2 , 0.3 , 0.4 ], 'max_depth' : [ 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 ], 'min_child_weight' : [ 1 , 3 , 5 , 7 , 9 ], 'subsample' : [ 0.8 , 0.9 , 1.0 ], 'colsample_bytree' : [ 0.6 , 0.7 , 0.8 , 0.9 , 1.0 ], 'gamma' : [ 0 , 0.1 , 0.2 , 0.3 , 0.4 ], 'lambda' : [ 0 , 0.1 , 0.2 , 0.3 , 0.4 ] } # Create an XGBoost model xgb_model = xgb.XGBClassifier() # Perform RandomizedSearchCV random_search = RandomizedSearchCV(xgb_model, param_distributions = param_dist, n_iter = 100 , cv = 5 , scoring = 'accuracy' , random_state = 42 ) random_search.fit(X_train, y_train) # Get the best hyperparameters best_params = random_search.best_params_ # Fit the model with the best hyperparameters on the entire dataset best_model = random_search.best_estimator_ best_model.fit(X_train, y_train) # Evaluate the best model on the test set accuracy = best_model.score(X_test, y_test) print (f "Best Hyperparameters: {best_params}" ) print (f "Accuracy on test set: {accuracy:.2f}" ) |
Output:
Best Hyperparameters: {'subsample': 0.8, 'n_estimators': 200, 'min_child_weight': 1, 'max_depth': 7, 'learning_rate': 0.01, 'lambda': 0.3, 'gamma': 0.3, 'colsample_bytree': 0.9}
Accuracy on test set: 1.00
In this output:
- The best hyperparameters found by the random search are listed.
- The accuracy on the test set is also reported, indicating how well the best model performs on unseen data.
- Randomized search is a more efficient way to explore hyperparameter space compared to grid search, especially when there are a large number of hyperparameters to consider.
Sklearn | Model Hyper-parameters Tuning
Hyperparameter tuning is the process of finding the optimal values for the hyperparameters of a machine-learning model. Hyperparameters are parameters that control the behaviour of the model but are not learned during training. Hyperparameter tuning is an important step in developing machine learning models because it can significantly improve the model’s performance on new data. However, hyperparameter tuning can be a time-consuming and challenging task. Scikit-learn provides several tools that can help you tune the hyperparameters of your machine-learning models. In this guide, we will provide a comprehensive overview of hyperparameter tuning in Scikit-learn.