ADASYN: Adaptive Synthetic Sampling Approach

ADASYN, an extension of the SMOTE technique, is also used in handling imbalanced datasets. ADASYN focuses on local densities of minority classes. It finds out the regions where the imbalance is very severe and applies the strategy to generate synthetic samples there. It generates more samples where the density is high and fewer samples where the density is low. This approach is highly useful in scenarios where class distribution varies across the feature space.

Working Procedure of ADASYN

Class Imbalance Ratios: The initial step is ADASYN is to calculate the ratio of minority class which is obtained by dividing the number of majority class samples by the number of minority class samples.
Finding density distribution: For every minority instance, we find its k-nearest neighbors. Then we find the distance between them using metrics like Manhattan distance or Euclidean distance. If the instances are surrounded by more nearby neighbors, then we consider the density to be higher else the density is considered to be low.
Sample generation ratio: Once both class imbalance ratio and density distribution are calculated, we compute the sample generation ratio. It finds out how many samples are to be generated for each minority class instance. For Higher densities and larger imbalanced instances, more synthetic samples are generated.
Generating synthetic samples: By combining the minority instances with their nearest neighbors, new samples are generated.
Balanced dataset creation: By combining the new synthetic samples with the original minority instances, the frequency of the minority classes increases. This makes the dataset balanced and helps the model to learn more accurately.

Python Implementation For ADASYN

Python

from imblearn.over_sampling import ADASYN

# Applying ADASYN
adasyn = ADASYN(sampling_strategy='minority')
x_resampled, y_resampled = adasyn.fit_resample(x, y)
# Count outcome values after applying ADASYN
y_resampled.value_counts()

Output:

Outcome
1    500
0    500
Name: count, dtype: int64

SMOTE for Imbalanced Classification with Python

Imbalanced datasets impact the performance of the machine learning models and the Synthetic Minority Over-sampling Technique (SMOTE) addresses the class imbalance problem by generating synthetic samples for the minority class. The article aims to explore the SMOTE, its working procedure, and various extensions to enhance its capability. The article provides Python implementations for SMOTE and its extensions, offering a comprehensive guide to tackle the problem of Imbalanced datasets in Python.

Table of Content

Data Imbalance in Classification Problem
SMOTE : Synthetic Minority Over-Sampling Technique
Extensions of SMOTE
ADASYN: Adaptive Synthetic Sampling Approach
Borderline SMOTE
SMOTE-ENN (Edited Nearest Neighbors)
SMOTE- TOMEK Links
SMOTE-NC (Nominal Continuous)
SMOTE for Imbalanced Classification: When to Use

Similar Reads

AlgorithmBest Use CaseStrengthsWhen to UseTraditional SMOTEGeneral imbalanced datasets where minority class enhancement is needed.Increases the number of minority class samples through interpolation, improving the generalization ability of classifiers.Use when your dataset is imbalanced but doesn’t have extreme noise or overlapping class issues. Suitable for straightforward augmentation needs.ADASYN (Adaptive Synthetic Sampling)Datasets where imbalance varies significantly across the feature space.Focuses on generating samples next to the original samples that are harder to learn, adapting to varying degrees of class imbalance.Use when certain areas of the feature space are more imbalanced than others, requiring adaptive density estimation.Borderline SMOTEDatasets where minority class examples are close to the decision boundary.Enhances classification near the borderline where misclassification risk is high.Use when data points from different classes overlap and are prone to misclassification, particularly in binary classification problems.SMOTE-NC (Nominal Continuous)Datasets that include a combination of nominal (categorical) and continuous features.Handles mixed data types without distorting the categorical feature space.Use when your dataset includes both categorical and continuous inputs, ensuring that synthetic samples respect the nature of both data types.SMOTE-ENN (Edited Nearest Neighbors)Datasets with potential noise and mislabeled examples.Combines over-sampling with cleaning to remove noisy and misclassified instances.Use when the dataset is noisy or contains outliers, and you want to refine the class boundary further after over-sampling.SMOTE+TOMEKBest for reducing overlap between classes after applying SMOTE.Cleans the data by removing Tomek links, which can help in enhancing the classifier’s performance.Use when you need a cleaner dataset with less overlap between classes, suitable for situations where class separation is a priority....

ADASYN: Adaptive Synthetic Sampling Approach

Working Procedure of ADASYN

Python Implementation For ADASYN

SMOTE for Imbalanced Classification with Python

Categories

Contact US