Algorithm based on Bagging and Boosting

Bagging Algorithm

Bagging is a supervised learning technique that can be used for both regression and classification tasks. Here is an overview of the steps including Bagging classifier algorithm:

Bootstrap Sampling: Divides the original training data into ‘N’ subsets and randomly selects a subset with replacement in some rows from other subsets. This step ensures that the base models are trained on diverse subsets of the data and there is no class imbalance.
Base Model Training: For each bootstrapped sample, train a base model independently on that subset of data. These weak models are trained in parallel to increase computational efficiency and reduce time consumption.
Prediction Aggregation: To make a prediction on testing data combine the predictions of all base models. For classification tasks, it can include majority voting or weighted majority while for regression, it involves averaging the predictions.
Out-of-Bag (OOB) Evaluation: Some samples are excluded from the training subset of particular base models during the bootstrapping method. These “out-of-bag” samples can be used to estimate the model’s performance without the need for cross-validation.
Final Prediction: After aggregating the predictions from all the base models, Bagging produces a final prediction for each instance.

Refer to this article – ML | Bagging classifier

Python pseudo code for Bagging Estimator implementing libraries:

Python3

from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Load the Iris dataset
data = load_iris()
X = data.data
y = data.target
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create a base classifier (e.g., Decision Tree)
base_classifier = DecisionTreeClassifier()
bagging_classifier = BaggingClassifier(base_classifier, n_estimators=10, random_state=42)
bagging_classifier.fit(X_train, y_train)
y_pred = bagging_classifier.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Output:

Accuracy: 1.0

Boosting Algorithm

Boosting is an ensemble technique that combines multiple weak learners to create a strong learner. The ensemble of weak models are trained in series such that each model that comes next, tries to correct errors of the previous model until the entire training dataset is predicted correctly. One of the most well-known boosting algorithms is AdaBoost (Adaptive Boosting).

Here are few popular boosting algorithm frameworks:

AdaBoost (Adaptive Boosting): AdaBoost assigns different weights to data points, focusing on challenging examples in each iteration. It combines weighted weak classifiers to make predictions.
Gradient Boosting: Gradient Boosting, including algorithms like Gradient Boosting Machines (GBM), XGBoost, and LightGBM, optimizes a loss function by training a sequence of weak learners to minimize the residuals between predictions and actual values, producing strong predictive models.

Refer to this article – Boosting algorithms.

Python pseudo code for boosting Estimator implementing libraries:

Python3

# Import necessary libraries and modules
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Load the dataset
data = load_iris()
X = data.data
y = data.target
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
base_classifier = DecisionTreeClassifier(max_depth=1)  # Weak learner
# Create an AdaBoost Classifier with Decision Tree as the base classifier
adaboost_classifier = AdaBoostClassifier(base_classifier, n_estimators=50, learning_rate=1.0, random_state=42)
adaboost_classifier.fit(X_train, y_train)
# Make predictions
y_pred = adaboost_classifier.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Output:

Accuracy: 1.0

A Comprehensive Guide to Ensemble Learning

Ensemble means ‘a collection of things’ and in Machine Learning terminology, Ensemble learning refers to the approach of combining multiple ML models to produce a more accurate and robust prediction compared to any individual model. It implements an ensemble of fast algorithms (classifiers) such as decision trees for learning and allows them to vote.

Table of Content

What is ensemble learning with examples?
Ensemble Learning Techniques
Algorithm based on Bagging and Boosting
How to stack estimators for a Classification Problem?
Uses of Ensemble Learning
Conclusion:
Ensemble Learning – FAQs

Algorithm based on Bagging and Boosting

Bagging Algorithm

Python3

Boosting Algorithm

Python3

A Comprehensive Guide to Ensemble Learning

Categories

Contact US

Algorithm based on Bagging and Boosting

Bagging Algorithm

Python3

Boosting Algorithm

Python3

A Comprehensive Guide to Ensemble Learning

Similar Reads

Categories

Contact US