Feature agglomeration vs. univariate selection using Scikit Learn

1. Import Libraries:

The required libraries are imported here:

  • The function load_iris is used to load the Iris dataset.
  • A class for univariate feature selection is called SelectKBest.
  • f_classif: A function that determines the sample’s ANOVA F-value.
  • A class for feature agglomeration is called FeatureAgglomeration.

Python3




from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectKBest, f_classif
from sklearn.cluster import FeatureAgglomeration


2. Load the Iris Dataset:

After loading the Iris dataset, its characteristics are kept in X, while the target labels are kept in Y.

Python3




iris = load_iris()
X, y = iris.data, iris.target


3. Feature Agglomeration:

To lower the dataset’s dimensionality, feature agglomeration is used. With n_clusters set to 2, the algorithm will attempt to divide the characteristics into two clusters. X_reduced contains the converted data.

Python3




agglomeration = FeatureAgglomeration(n_clusters=2)
X_reduced = agglomeration.fit_transform(X)


4. Univariate Selection:

ANOVA F-value is used in the application of univariate feature selection. According to k=2, just the top two traits ought to be chosen. X_k_best is where the altered data is kept.

Python3




k_best = SelectKBest(f_classif, k=2)
X_k_best = k_best.fit_transform(X, y)


5. Display the Results:

Python3




print("Original Shape:", X.shape)
print("Agglomerated Shape:", X_reduced.shape)
print("Univariate Selection Shape:", X_k_best.shape)


Output:

Original Shape: (150, 4)
Agglomerated Shape: (150, 2)
Univariate Selection Shape: (150, 2)

6. Train the model using both dataset with Agglomerative clustered dataset

Python3




from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report
# DecisionTreeClassifier
tree_clf = DecisionTreeClassifier(criterion='entropy',
                                  max_depth=2)
tree_clf.fit(X_reduced, y)
pred = tree_clf.predict(X_reduced)
 
print(classification_report(y, pred, target_names=iris.target_names))


Output:

              precision    recall  f1-score   support

setosa 1.00 1.00 1.00 50
versicolor 0.96 0.88 0.92 50
virginica 0.89 0.96 0.92 50

accuracy 0.95 150
macro avg 0.95 0.95 0.95 150
weighted avg 0.95 0.95 0.95 150

6. Train the model using both dataset with Univariate feature selection dataset

Python3




# DecisionTreeClassifier
from sklearn.metrics import classification_report
tree_clf = DecisionTreeClassifier(criterion='entropy',
                                  max_depth=2)
tree_clf.fit(X_k_best, y)
pred = tree_clf.predict(X_k_best)
 
print(classification_report(y, pred, target_names=iris.target_names))


Output:

              precision    recall  f1-score   support

setosa 1.00 1.00 1.00 50
versicolor 0.91 0.98 0.94 50
virginica 0.98 0.90 0.94 50

accuracy 0.96 150
macro avg 0.96 0.96 0.96 150
weighted avg 0.96 0.96 0.96 150

As we can see from the above that Univariate feature selection has performed better as compare to agglomeratice clustering.



Feature Agglomeration vs Univariate Selection in Scikit Learn

Selecting the most relevant characteristics for a given job is the aim of feature selection, a crucial stage in machine learning. Feature Agglomeration and Univariate Selection are two popular methods for feature selection in Scikit-Learn. These techniques aid in the reduction of dimensionality, increase model effectiveness, and maybe improve model performance.

Similar Reads

What is Feature Agglomeration?

Character One method for reducing dimensionality is agglomeration. Combining related characteristics from the dataset reduces the amount of aggregated features while maintaining the most crucial information. When working with high-dimensional data that has a large number of characteristics, it is quite helpful....

What is Univariate Selection?

The feature selection technique known as “univariate selection” assesses each feature separately. It chooses the highest-ranked characteristics for further examination or modeling after ranking the features according to a set of statistical standards....

Feature agglomeration vs. univariate selection using Scikit Learn

1. Import Libraries:...