Why Tune hyperparameters in Decision Trees?
While training the machine learning models, the requirement for different sets of hyperparameters arises because of the needs of each dataset and model. One such solution to determine the hyperparameter is to perform multiple experiments that allow us to choose a set of hyperparameters that best suits our model. This process of selecting the optimal hyperparameter is called hyperparameter tuning.
Tuning hyperparameters is crucial for decision trees for below reasons:
- Improved Performance: Untuned hyperparameters can lead to sub-optimal decision trees. Tuning allows you to find the settings that best suit your data, resulting in a model that captures the underlying patterns more effectively and delivers better predictions.
- Reduced Overfitting: Decision trees are prone to overfitting, where the model memorizes the training data’s noise instead of learning generalizable patterns. Hyperparameter tuning helps prevent this by controlling the tree’s complexity (e.g., with
max_depth
) and preventing excessive granularity (e.g., withmin_samples_split
). - Enhanced Generalization:The goal is for the decision tree to perform well on unseen data. Tuning hyperparameters helps achieve this by striking a balance between model complexity and flexibility. A well-tuned tree can capture the important trends in the data without overfitting to the specifics of the training set, leading to better performance on new data.
- Addressing Class Imbalance: Class imbalance occurs when one class has significantly fewer samples than others. Tuning hyperparameters like
min_weight_fraction_leaf
allows you to leverage sample weights and ensure the tree doesn’t get biased towards the majority class, leading to more accurate predictions for the minority class. - Tailoring the Model to Specific Tasks: Different tasks might require different decision tree behaviors. Hyperparameter tuning allows you to customize the tree’s structure and learning process to fit the specific needs of your prediction problem. For example, you might prioritize capturing complex relationships by adjusting
max_depth
for a complex classification task.
Types of Hyperparameters in Decision Tree
Hyperparameter in decision trees are essential settings that controls the behavior and the structure of the model during the training phase. The major hyperparameters that are used to fine-tune the decision:
- Criteria : The quality of the split in the decision tree is measured by the function called criteria. The criteria support two types such as gini (Gini impurity) and entropy (information gain).
- Gini index – Gini impurity or Gini index is the measure that parts the probability distributions of the target attribute’s values. It splits the node in a way that yields the least amount of impurity.
- Information gain – It is an impurity measure that uses the entropy measure to spilt a node in a way that it yields the most amount of information gain.
- max_depth: As the name suggests, max_depth hyperparameter controls the maximum depth to which the decision tree is allowed to grow. When the max_depth is deeper it allows the tree to capture more complex patterns in the training data potentially reducing the training error. However, setting max_depth too high can lead to overfitting where the model memorizes the noise in the training data. It is very important to tune max_depth carefully to find the right balance between model complexity and generalization performance. The input option for max_depth can be a positive integer or ‘None’ that indicates no maximum depth limit. Example:
max_depth = 3
limits the tree to three levels, achieving moderate complexity and reducing overfitting risk. - min_samples_split: The min_sample_split hyperparameter defines the minimal number of samples that are needed to split a node. It should be noted that the min_samples_split works as a threshold to split a node in a decision tree, if the number of samples in a node is less than min_samples_split, the node will not be split and it will turn into a leaf node. The input option for min_samples_split can be an integer that indicates the minimum number of samples necessary in an internal node or it can be a fraction that indicates the minimum percentage of samples needed in an internal node. Example:
min_samples_split = 10
ensures a node must have at least 10 samples before splitting. - min_samples_leaf: The min_samples_leaf hyperparameter defines the required minimal amount of samples to be present at a leaf node. It acts as a threshold for halting the splitting process and designating a node as a leaf. The condition for splitting is that it must leave at least min_samples_leaf samples on both resulting child nodes, this ensures that the splitting process doesn’t continue indefinitely. The input option for min_samples_leaf can either be an integer or float. For example, min_samples_leaf = 5. We set a hyperparameter value of 5 to min_samples_leaf that ensures each leaf node in the decision tree must contain at least 5 samples which prevents further splitting if the node reaches this threshold.
- max_features: The max_features hyperparameter allow us to control the number of features to be considered when looking for the best split in the decision tree. It can either define an exact number of features to consider at each split or as a percentage that represents the proportion of features to consider. The input options can be an integer, float, auto, sqrt, log2. It function as follows:
- auto – It allows the decision tree algorithm to consider all the features for each split.
- sqrt – It allows the algorithm to consider only the square root of the total number of features for each split
- log2 – It allows the algorithm to consider the logarithm base 2 of a total number of features for each split.
- min_weight_fraction_leaf: The min_weight_fraction_leaf hyperparameter that is used to control the tree’s structure based on the weights assigned to each sample. This hyperparameter determines the minimum fraction of input samples required at a leaf node. It also has the potential to deal with class imbalance where one class may have particularly fewer samples than others, to solve the class imbalance we use sample weights. When the decision tree is biased toward the majority classes then it fails to be aware of sample weights like min_sample_leaf. To resolve this issue, weight-based criteria can be used that are quite less challenging to optimize the tree structure if the samples are weighted. The min_weight_fraction_leaf hyperparameter’s leaf node holds at least a fraction of the overall sum of the weights. For example, min_weight_fraction_leaf = 0.1. Here, we set a hyperparameter value of 0.1 which helps us to guarantee that the presence of each leaf node in the decision tree must hold at least 10% if the tidal sum of sample weights potentially helps to address the class imbalance and optimize the tree structure.
How to tune a Decision Tree in Hyperparameter tuning
Decision trees are powerful models extensively used in machine learning for classification and regression tasks. The structure of decision trees resembles the flowchart of decisions helps us to interpret and explain easily. However, the performance of decision trees highly relies on the hyperparameters, selecting the optimal hyperparameter can significantly impact the model’s accuracy, generalization ability, and robustness.
In this article, we will explore the different ways to tune the hyperparameters and their optimization techniques with the help of decision trees.
Table of Content
- Hyperparameters in Decision Trees
- Why Tune hyperparameters in Decision Trees?
- Methods for Hyperparameter Tuning in Decision Tree
- Implementing Hyperparameter Tuning in a decision Tree