Bootstrapping Approach In XGboost

1. Importing Necessary Libraries And Generating Synthetic Data

We import the necessary libraries and generate synthetic data.

Python
import xgboost as xgb
import numpy as np
import matplotlib.pyplot as plt

# Generate synthetic data
np.random.seed(42)
X_train = np.random.rand(100, 10)
y_train = np.random.rand(100)
X_test = np.random.rand(20, 10)

2. Bootstrapping

We apply bootstrapping to the model.

Python
n_iterations = 100  # Number of bootstrapped models
predictions = []

for i in range(n_iterations):
    # Create a bootstrapped dataset
    indices = np.random.choice(len(X_train), len(X_train), replace=True)
    X_resampled, y_resampled = X_train[indices], y_train[indices]
    
    # Train an XGBoost model
    model = xgb.XGBRegressor()
    model.fit(X_resampled, y_resampled)
    
    # Predict on test data
    preds = model.predict(X_test)
    predictions.append(preds)

# Convert predictions to a NumPy array
predictions = np.array(predictions)

# Calculate the mean and standard deviation of the predictions
mean_preds = np.mean(predictions, axis=0)
std_preds = np.std(predictions, axis=0)

# Confidence intervals
lower_bound = mean_preds - 1.96 * std_preds
upper_bound = mean_preds + 1.96 * std_preds

3. Vizualising the results

The results are visualized by plotting the mean prediction and filling the area between the lower and upper bounds, effectively illustrating the prediction interval around the mean predictions.

Python
# Visualization
plt.figure(figsize=(10, 6))
plt.plot(mean_preds, label='Mean Prediction', color='blue')
plt.fill_between(range(len(mean_preds)), lower_bound, upper_bound, color='gray', alpha=0.5, label='95% Confidence Interval')
plt.title('Bootstrapping Prediction Interval')
plt.xlabel('Test Data Points')
plt.ylabel('Predictions')
plt.legend()
plt.show()


Output:

By applying Quantile Regression and Bootstrapping methods, we can estimate the uncertainty of predictions made by an XGBoost model. These approaches help us generate confidence intervals that provide a range within which the true predictions are likely to lie, enhancing the interpretability and reliability of our machine learning models.



Confidence Intervals for XGBoost

Confidence intervals provide a range within which we expect the true value of a parameter to lie, with a certain level of confidence. In the context of XGBoost, confidence intervals can be used to quantify the uncertainty of predictions. In this article we explain how to compute confidence intervals for predictions made by an XGBoost model.

Similar Reads

What are Confidence Intervals?

A confidence interval is defined by two values, the lower bound and the upper bound, which encompass the parameter of interest with a specified probability (e.g., 95%). In the context of machine learning predictions, confidence intervals can be used to estimate the uncertainty in the predictions....

What is XGBoost?

XGBoost (eXtreme Gradient Boosting) is a powerful and scalable machine-learning library for gradient boosting. It is designed to be highly efficient, flexible, and portable, making it a popular choice for a wide range of machine-learning tasks. XGBoost stands out for its performance and speed, which is achieved through various system and algorithmic optimizations....

Method 1: Quantile Regression with XGBoost

1. Importing Necessary Libraries...

Method 2: Bootstrapping Approach In XGboost

1. Importing Necessary Libraries And Generating Synthetic Data...