Bootstrapping Approach In XGboost
1. Importing Necessary Libraries And Generating Synthetic Data
We import the necessary libraries and generate synthetic data.
import xgboost as xgb
import numpy as np
import matplotlib.pyplot as plt
# Generate synthetic data
np.random.seed(42)
X_train = np.random.rand(100, 10)
y_train = np.random.rand(100)
X_test = np.random.rand(20, 10)
2. Bootstrapping
We apply bootstrapping to the model.
n_iterations = 100 # Number of bootstrapped models
predictions = []
for i in range(n_iterations):
# Create a bootstrapped dataset
indices = np.random.choice(len(X_train), len(X_train), replace=True)
X_resampled, y_resampled = X_train[indices], y_train[indices]
# Train an XGBoost model
model = xgb.XGBRegressor()
model.fit(X_resampled, y_resampled)
# Predict on test data
preds = model.predict(X_test)
predictions.append(preds)
# Convert predictions to a NumPy array
predictions = np.array(predictions)
# Calculate the mean and standard deviation of the predictions
mean_preds = np.mean(predictions, axis=0)
std_preds = np.std(predictions, axis=0)
# Confidence intervals
lower_bound = mean_preds - 1.96 * std_preds
upper_bound = mean_preds + 1.96 * std_preds
3. Vizualising the results
The results are visualized by plotting the mean prediction and filling the area between the lower and upper bounds, effectively illustrating the prediction interval around the mean predictions.
# Visualization
plt.figure(figsize=(10, 6))
plt.plot(mean_preds, label='Mean Prediction', color='blue')
plt.fill_between(range(len(mean_preds)), lower_bound, upper_bound, color='gray', alpha=0.5, label='95% Confidence Interval')
plt.title('Bootstrapping Prediction Interval')
plt.xlabel('Test Data Points')
plt.ylabel('Predictions')
plt.legend()
plt.show()
Output:
By applying Quantile Regression and Bootstrapping methods, we can estimate the uncertainty of predictions made by an XGBoost model. These approaches help us generate confidence intervals that provide a range within which the true predictions are likely to lie, enhancing the interpretability and reliability of our machine learning models.
Confidence Intervals for XGBoost
Confidence intervals provide a range within which we expect the true value of a parameter to lie, with a certain level of confidence. In the context of XGBoost, confidence intervals can be used to quantify the uncertainty of predictions. In this article we explain how to compute confidence intervals for predictions made by an XGBoost model.