Importance of Bagging Function in R
By creating multiple training sets and combining their predictions, bagging is a machine learning technique used to reduce the variance of a model. It is a function in the R package ipred. Bagging, also known as bootstrap aggregating, is the process of generating each training batch by randomly selecting data from the original dataset and replacing it.
The bagging function in R Programming Language accepts a number of parameters, such as the model’s formula, the data set to be used, the number of bags to produce, and the kind of model to be used. A decision tree is a model that bagging uses by default, but other models can also be specified. Here is an example of how to use the bagging function in R:
R
library (ipred) library (rpart) # Loading the iris dataset data (iris) set.seed (1) #fit the bagged model bag <- bagging ( formula = Species ~ ., data = iris, nbagg = 50, coob = TRUE , control = rpart.control (minsplit = 2, cp = 0, min_depth=2) ) bag |
Output:
Bagging classification trees with 50 bootstrap replications Call: bagging.data.frame(formula = Species ~ ., data = iris, nbagg = 50, coob = TRUE, control = rpart.control(minsplit = 2, cp = 0, min_depth = 2)) Out-of-bag estimate of misclassification error: 0.06
To begin, we load the iris dataset in this example and specify the result variable and predictors. The outcome variable and predictors, the number of bootstrap samples (nbagg), whether to employ out-of-bag estimates (coob), and the control parameters for the decision tree method (rpart.control(maxdepth = 2)) are then specified. This creates a bagged decision tree model. Lastly, we estimate the accuracy of the bagged model’s predictions.
Perform Bagging in R
We only utilize one training dataset when building a decision tree for a certain dataset. However, adopting a single decision tree has the drawback of having a high variance. That is, the outcomes could be very different if we divided the dataset in half and used the decision tree on each half. Bagging, also known as bootstrap aggregating, is a technique we can use to lower the variance of a single decision tree.
Using bags operates as follows:
- Take b samples from the initial dataset that have been bootstrapped.
- Create a decision tree for every sample that was bootstrapped.
- To get a final model, average each tree’s projections.