Algorithm based on Bagging and Boosting
Bagging Algorithm
Bagging is a supervised learning technique that can be used for both regression and classification tasks. Here is an overview of the steps including Bagging classifier algorithm:
- Bootstrap Sampling: Divides the original training data into ‘N’ subsets and randomly selects a subset with replacement in some rows from other subsets. This step ensures that the base models are trained on diverse subsets of the data and there is no class imbalance.
- Base Model Training: For each bootstrapped sample, train a base model independently on that subset of data. These weak models are trained in parallel to increase computational efficiency and reduce time consumption.
- Prediction Aggregation: To make a prediction on testing data combine the predictions of all base models. For classification tasks, it can include majority voting or weighted majority while for regression, it involves averaging the predictions.
- Out-of-Bag (OOB) Evaluation: Some samples are excluded from the training subset of particular base models during the bootstrapping method. These “out-of-bag” samples can be used to estimate the model’s performance without the need for cross-validation.
- Final Prediction: After aggregating the predictions from all the base models, Bagging produces a final prediction for each instance.
Refer to this article – ML | Bagging classifier
Python pseudo code for Bagging Estimator implementing libraries:
Python3
from sklearn.ensemble import BaggingClassifier from sklearn.tree import DecisionTreeClassifier from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score # Load the Iris dataset data = load_iris() X = data.data y = data.target # Split the data into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2 , random_state = 42 ) # Create a base classifier (e.g., Decision Tree) base_classifier = DecisionTreeClassifier() bagging_classifier = BaggingClassifier(base_classifier, n_estimators = 10 , random_state = 42 ) bagging_classifier.fit(X_train, y_train) y_pred = bagging_classifier.predict(X_test) accuracy = accuracy_score(y_test, y_pred) print ( "Accuracy:" , accuracy) |
Output:
Accuracy: 1.0
Boosting Algorithm
Boosting is an ensemble technique that combines multiple weak learners to create a strong learner. The ensemble of weak models are trained in series such that each model that comes next, tries to correct errors of the previous model until the entire training dataset is predicted correctly. One of the most well-known boosting algorithms is AdaBoost (Adaptive Boosting).
Here are few popular boosting algorithm frameworks:
- AdaBoost (Adaptive Boosting): AdaBoost assigns different weights to data points, focusing on challenging examples in each iteration. It combines weighted weak classifiers to make predictions.
- Gradient Boosting: Gradient Boosting, including algorithms like Gradient Boosting Machines (GBM), XGBoost, and LightGBM, optimizes a loss function by training a sequence of weak learners to minimize the residuals between predictions and actual values, producing strong predictive models.
Refer to this article – Boosting algorithms.
Python pseudo code for boosting Estimator implementing libraries:
Python3
# Import necessary libraries and modules from sklearn.ensemble import AdaBoostClassifier from sklearn.tree import DecisionTreeClassifier from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score # Load the dataset data = load_iris() X = data.data y = data.target # Split the data into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2 , random_state = 42 ) base_classifier = DecisionTreeClassifier(max_depth = 1 ) # Weak learner # Create an AdaBoost Classifier with Decision Tree as the base classifier adaboost_classifier = AdaBoostClassifier(base_classifier, n_estimators = 50 , learning_rate = 1.0 , random_state = 42 ) adaboost_classifier.fit(X_train, y_train) # Make predictions y_pred = adaboost_classifier.predict(X_test) accuracy = accuracy_score(y_test, y_pred) print ( "Accuracy:" , accuracy) |
Output:
Accuracy: 1.0
A Comprehensive Guide to Ensemble Learning
Ensemble means ‘a collection of things’ and in Machine Learning terminology, Ensemble learning refers to the approach of combining multiple ML models to produce a more accurate and robust prediction compared to any individual model. It implements an ensemble of fast algorithms (classifiers) such as decision trees for learning and allows them to vote.
Table of Content
- What is ensemble learning with examples?
- Ensemble Learning Techniques
- Algorithm based on Bagging and Boosting
- How to stack estimators for a Classification Problem?
- Uses of Ensemble Learning
- Conclusion:
- Ensemble Learning – FAQs