Algorithm based on Bagging and Boosting

Bagging Algorithm

Bagging is a supervised learning technique that can be used for both regression and classification tasks. Here is an overview of the steps including Bagging classifier algorithm:

  • Bootstrap Sampling: Divides the original training data into ‘N’ subsets and randomly selects a subset with replacement in some rows from other subsets. This step ensures that the base models are trained on diverse subsets of the data and there is no class imbalance.
  • Base Model Training: For each bootstrapped sample, train a base model independently on that subset of data. These weak models are trained in parallel to increase computational efficiency and reduce time consumption.
  • Prediction Aggregation: To make a prediction on testing data combine the predictions of all base models. For classification tasks, it can include majority voting or weighted majority while for regression, it involves averaging the predictions.
  • Out-of-Bag (OOB) Evaluation: Some samples are excluded from the training subset of particular base models during the bootstrapping method. These “out-of-bag” samples can be used to estimate the model’s performance without the need for cross-validation.
  • Final Prediction: After aggregating the predictions from all the base models, Bagging produces a final prediction for each instance.

Refer to this article – ML | Bagging classifier

Python pseudo code for Bagging Estimator implementing libraries:

Python3




from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Load the Iris dataset
data = load_iris()
X = data.data
y = data.target
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create a base classifier (e.g., Decision Tree)
base_classifier = DecisionTreeClassifier()
bagging_classifier = BaggingClassifier(base_classifier, n_estimators=10, random_state=42)
bagging_classifier.fit(X_train, y_train)
y_pred = bagging_classifier.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)


Output:

Accuracy: 1.0

Boosting Algorithm

Boosting is an ensemble technique that combines multiple weak learners to create a strong learner. The ensemble of weak models are trained in series such that each model that comes next, tries to correct errors of the previous model until the entire training dataset is predicted correctly. One of the most well-known boosting algorithms is AdaBoost (Adaptive Boosting).

Here are few popular boosting algorithm frameworks:

  • AdaBoost (Adaptive Boosting): AdaBoost assigns different weights to data points, focusing on challenging examples in each iteration. It combines weighted weak classifiers to make predictions.
  • Gradient Boosting: Gradient Boosting, including algorithms like Gradient Boosting Machines (GBM), XGBoost, and LightGBM, optimizes a loss function by training a sequence of weak learners to minimize the residuals between predictions and actual values, producing strong predictive models.

Refer to this article – Boosting algorithms.

Python pseudo code for boosting Estimator implementing libraries:

Python3




# Import necessary libraries and modules
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Load the dataset
data = load_iris()
X = data.data
y = data.target
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
base_classifier = DecisionTreeClassifier(max_depth=1# Weak learner
# Create an AdaBoost Classifier with Decision Tree as the base classifier
adaboost_classifier = AdaBoostClassifier(base_classifier, n_estimators=50, learning_rate=1.0, random_state=42)
adaboost_classifier.fit(X_train, y_train)
# Make predictions
y_pred = adaboost_classifier.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)


Output:

Accuracy: 1.0

A Comprehensive Guide to Ensemble Learning

Ensemble means ‘a collection of things’ and in Machine Learning terminology, Ensemble learning refers to the approach of combining multiple ML models to produce a more accurate and robust prediction compared to any individual model. It implements an ensemble of fast algorithms (classifiers) such as decision trees for learning and allows them to vote.

Table of Content

  • What is ensemble learning with examples?
  • Ensemble Learning Techniques
  • Algorithm based on Bagging and Boosting
  • How to stack estimators for a Classification Problem?
  • Uses of Ensemble Learning
  • Conclusion:
  • Ensemble Learning – FAQs

Similar Reads

What is Ensemble Learning with examples?

Ensemble learning is a machine learning technique that combines the predictions from multiple individual models to obtain a better predictive performance than any single model. The basic idea behind ensemble learning is to leverage the wisdom of the crowd by aggregating the predictions of multiple models, each of which may have its own strengths and weaknesses. This can lead to improved performance and generalization. Ensemble learning can be thought of as compensation for poor learning algorithms that are computationally more expensive than a single model. But they are more efficient than a single non-ensemble model that has passed through a lot of learning. In this article, we will have a comprehensive overview of the importance of ensemble learning and how it works, different types of ensemble classifiers, advanced ensemble learning techniques, and some algorithms (such as random forest, xgboost) for better clarification of the common ensemble classifiers and finally their uses in the technical world. Several individual base models (experts) are fitted to learn from the same data and produce an aggregation of output based on which a final decision is taken. These base models can be machine learning algorithms such as decision trees (mostly used), linear models, support vector machines (SVM), neural networks, or any other model that is capable of making predictions. Most commonly used ensembles include techniques such as Bagging- used to generate Random Forest algorithms and Boosting- to generate algorithms such as Adaboost, Xgboost etc....

Ensemble Learning Techniques

Gradient Boosting Machines (GBM): Gradient Boosting is a popular ensemble learning technique that sequentially builds a group of decision trees and corrects the residual errors made by previous trees, enhancing its predictive accuracy. It trains each new weak learner to fit the residuals of the previous ensemble’s predictions thus making it less sensitive to individual data points or outliers in the data. Extreme Gradient Boosting (XGBoost): XGBoost features tree pruning, regularization, and parallel processing, which makes it a preferred choice for data scientists seeking robust and accurate predictive models. CatBoost: It is designed to handle features categorically that eliminates the need for extensive pre-processing.CatBoost is known for its high predictive accuracy, fast training, and automatic handling of overfitting. Stacking: It combines the output of multiple base models by training a combiner(an algorithm that takes predictions of base models) and generate more accurate prediction. Stacking allows for more flexibility in combining diverse models, and the combiner can be any machine learning algorithm. Random Subspace Method (Random Subspace Ensembles): It is an ensemble learning approach that improves the predictive accuracy by training base models on random subsets of input features. It mitigates overfitting and improves the generalization by introducing diversity in the model space. Random Forest Variants: They introduce variations in tree construction, feature selection, or model optimization to enhance performance....

Algorithm based on Bagging and Boosting

Bagging Algorithm...

How to stack estimators for a Classification Problem?

...

Uses of Ensemble Learning

...

Conclusion

Diversify Your Choice of Base Models: Start by choosing a diverse mix of base classifiers for your classification task. These may include decision trees, support vector machines, random forests, logistic regression, or other suitable algorithms. Varying your base models can often lead to a more robust stacking result. Data partitioning: Split your labeled data set into at least two parts: a training set and a separate validation set. You will use the training set to train your base models, and the validation set helps generate new features based on their predictions. Training base models : Train each base model in the training set. It is important to ensure that all base models are trained on the same set of features and labels for consistency. Generate predictions : Make predictions on the validation set using the base models you have trained. These predictions become new features for the next steps. Developing a meta-learner: Select a meta-learner, such as logistic regression.Final Inferences: Once the meta-learner is trained, use it to make final inferences from new and unseen data. Combine predictions from the base model to create input features for the meta-learner. Evaluate and refine: Assess the performance of your stacked ensemble using various evaluation metrics, such as accuracy, precision, recall, or F1-score. Fine-tune the group by adjusting the meta-teacher or considering additional foundation models as needed....

Ensemble Learning – FAQs

Ensemble learning is a versatile approach that can be applied to a wide range of machine learning problems such as:-...