SMOTE-ENN (Edited Nearest Neighbors)

SMOTE-ENN combines the SMOTE method with the Edited Nearest Neighbors (ENN) rule. ENN is used to clean the data by removing any samples that are misclassified by their nearest neighbors. This combination helps in cleaning up the synthetic samples, improving the overall quality of the dataset. The objective of ENN is to remove noisy or ambiguous samples, which may include both minority and majority class instances.

Working Procedure of SMOTE-ENN (Edited Nearest Neighbors)

SMOTE Application: First, apply SMOTE to generate synthetic samples.
ENN Application: Then, use ENN to remove synthetic or original samples that have a majority of their nearest neighbors belonging to the opposite class.
Cleaning Data: This step helps in removing noisy instances and those that are likely to be misclassified.

Python Implementation for SMOTE-ENN (Edited Nearest Neighbors)

Python

from imblearn.combine import SMOTEENN
smote_enn = SMOTEENN()
X_resampled, y_resampled = smote_enn.fit_resample(x, y)
y_resampled.value_counts()

Output:

Outcome
1    297
0    215
Name: count, dtype: int64

Initial Distribution: Before applying SMOTE-ENN, the distribution of the classes was 500 instances of class 0 and 268 instances of class 1.
SMOTE Oversampling: SMOTE generates synthetic samples for the minority class (class 1) to balance the class distribution. This increases the number of instances in class 1.
Edited Nearest Neighbors (ENN):
- After SMOTE oversampling, the dataset contain synthetic samples that are misclassified or considered noisy by ENN.
- ENN removes some of these synthetic samples, which lead to a reduction in the number of instances for both classes, but especially for class 1 since it was oversampled.

Therefore, after applying SMOTE-ENN, class 1 has 297 instances, and class 0 has 215 instances.

SMOTE for Imbalanced Classification with Python

Imbalanced datasets impact the performance of the machine learning models and the Synthetic Minority Over-sampling Technique (SMOTE) addresses the class imbalance problem by generating synthetic samples for the minority class. The article aims to explore the SMOTE, its working procedure, and various extensions to enhance its capability. The article provides Python implementations for SMOTE and its extensions, offering a comprehensive guide to tackle the problem of Imbalanced datasets in Python.

Table of Content

Data Imbalance in Classification Problem
SMOTE : Synthetic Minority Over-Sampling Technique
Extensions of SMOTE
ADASYN: Adaptive Synthetic Sampling Approach
Borderline SMOTE
SMOTE-ENN (Edited Nearest Neighbors)
SMOTE- TOMEK Links
SMOTE-NC (Nominal Continuous)
SMOTE for Imbalanced Classification: When to Use

Similar Reads

AlgorithmBest Use CaseStrengthsWhen to UseTraditional SMOTEGeneral imbalanced datasets where minority class enhancement is needed.Increases the number of minority class samples through interpolation, improving the generalization ability of classifiers.Use when your dataset is imbalanced but doesn’t have extreme noise or overlapping class issues. Suitable for straightforward augmentation needs.ADASYN (Adaptive Synthetic Sampling)Datasets where imbalance varies significantly across the feature space.Focuses on generating samples next to the original samples that are harder to learn, adapting to varying degrees of class imbalance.Use when certain areas of the feature space are more imbalanced than others, requiring adaptive density estimation.Borderline SMOTEDatasets where minority class examples are close to the decision boundary.Enhances classification near the borderline where misclassification risk is high.Use when data points from different classes overlap and are prone to misclassification, particularly in binary classification problems.SMOTE-NC (Nominal Continuous)Datasets that include a combination of nominal (categorical) and continuous features.Handles mixed data types without distorting the categorical feature space.Use when your dataset includes both categorical and continuous inputs, ensuring that synthetic samples respect the nature of both data types.SMOTE-ENN (Edited Nearest Neighbors)Datasets with potential noise and mislabeled examples.Combines over-sampling with cleaning to remove noisy and misclassified instances.Use when the dataset is noisy or contains outliers, and you want to refine the class boundary further after over-sampling.SMOTE+TOMEKBest for reducing overlap between classes after applying SMOTE.Cleans the data by removing Tomek links, which can help in enhancing the classifier’s performance.Use when you need a cleaner dataset with less overlap between classes, suitable for situations where class separation is a priority....

SMOTE-ENN (Edited Nearest Neighbors)

Working Procedure of SMOTE-ENN (Edited Nearest Neighbors)

Python Implementation for SMOTE-ENN (Edited Nearest Neighbors)

SMOTE for Imbalanced Classification with Python

Categories

Contact US