SMOTE-ENN (Edited Nearest Neighbors)
SMOTE-ENN combines the SMOTE method with the Edited Nearest Neighbors (ENN) rule. ENN is used to clean the data by removing any samples that are misclassified by their nearest neighbors. This combination helps in cleaning up the synthetic samples, improving the overall quality of the dataset. The objective of ENN is to remove noisy or ambiguous samples, which may include both minority and majority class instances.
Working Procedure of SMOTE-ENN (Edited Nearest Neighbors)
- SMOTE Application: First, apply SMOTE to generate synthetic samples.
- ENN Application: Then, use ENN to remove synthetic or original samples that have a majority of their nearest neighbors belonging to the opposite class.
- Cleaning Data: This step helps in removing noisy instances and those that are likely to be misclassified.
Python Implementation for SMOTE-ENN (Edited Nearest Neighbors)
from imblearn.combine import SMOTEENN
smote_enn = SMOTEENN()
X_resampled, y_resampled = smote_enn.fit_resample(x, y)
y_resampled.value_counts()
Output:
Outcome
1 297
0 215
Name: count, dtype: int64
- Initial Distribution: Before applying SMOTE-ENN, the distribution of the classes was 500 instances of class 0 and 268 instances of class 1.
- SMOTE Oversampling: SMOTE generates synthetic samples for the minority class (class 1) to balance the class distribution. This increases the number of instances in class 1.
- Edited Nearest Neighbors (ENN):
- After SMOTE oversampling, the dataset contain synthetic samples that are misclassified or considered noisy by ENN.
- ENN removes some of these synthetic samples, which lead to a reduction in the number of instances for both classes, but especially for class 1 since it was oversampled.
Therefore, after applying SMOTE-ENN, class 1 has 297 instances, and class 0 has 215 instances.
SMOTE for Imbalanced Classification with Python
Imbalanced datasets impact the performance of the machine learning models and the Synthetic Minority Over-sampling Technique (SMOTE) addresses the class imbalance problem by generating synthetic samples for the minority class. The article aims to explore the SMOTE, its working procedure, and various extensions to enhance its capability. The article provides Python implementations for SMOTE and its extensions, offering a comprehensive guide to tackle the problem of Imbalanced datasets in Python.
Table of Content
- Data Imbalance in Classification Problem
- SMOTE : Synthetic Minority Over-Sampling Technique
- Extensions of SMOTE
- ADASYN: Adaptive Synthetic Sampling Approach
- Borderline SMOTE
- SMOTE-ENN (Edited Nearest Neighbors)
- SMOTE- TOMEK Links
- SMOTE-NC (Nominal Continuous)
- SMOTE for Imbalanced Classification: When to Use