Feature agglomeration vs. univariate selection using Scikit Learn

1. Import Libraries:

The required libraries are imported here:

The function load_iris is used to load the Iris dataset.
A class for univariate feature selection is called SelectKBest.
f_classif: A function that determines the sample’s ANOVA F-value.
A class for feature agglomeration is called FeatureAgglomeration.

Python3

from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectKBest, f_classif
from sklearn.cluster import FeatureAgglomeration

2. Load the Iris Dataset:

After loading the Iris dataset, its characteristics are kept in X, while the target labels are kept in Y.

Python3

iris = load_iris()
X, y = iris.data, iris.target

3. Feature Agglomeration:

To lower the dataset’s dimensionality, feature agglomeration is used. With n_clusters set to 2, the algorithm will attempt to divide the characteristics into two clusters. X_reduced contains the converted data.

Python3

agglomeration = FeatureAgglomeration(n_clusters=2)
X_reduced = agglomeration.fit_transform(X)

4. Univariate Selection:

ANOVA F-value is used in the application of univariate feature selection. According to k=2, just the top two traits ought to be chosen. X_k_best is where the altered data is kept.

Python3

k_best = SelectKBest(f_classif, k=2)
X_k_best = k_best.fit_transform(X, y)

5. Display the Results:

Python3

print("Original Shape:", X.shape)
print("Agglomerated Shape:", X_reduced.shape)
print("Univariate Selection Shape:", X_k_best.shape)

Output:

Original Shape: (150, 4)
Agglomerated Shape: (150, 2)
Univariate Selection Shape: (150, 2)

6. Train the model using both dataset with Agglomerative clustered dataset

Python3

from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report
# DecisionTreeClassifier
tree_clf = DecisionTreeClassifier(criterion='entropy',
                                  max_depth=2)
tree_clf.fit(X_reduced, y)
pred = tree_clf.predict(X_reduced)
 
print(classification_report(y, pred, target_names=iris.target_names))

Output:

              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        50
  versicolor       0.96      0.88      0.92        50
   virginica       0.89      0.96      0.92        50

    accuracy                           0.95       150
   macro avg       0.95      0.95      0.95       150
weighted avg       0.95      0.95      0.95       150

6. Train the model using both dataset with Univariate feature selection dataset

Python3

# DecisionTreeClassifier
from sklearn.metrics import classification_report
tree_clf = DecisionTreeClassifier(criterion='entropy',
                                  max_depth=2)
tree_clf.fit(X_k_best, y)
pred = tree_clf.predict(X_k_best)
 
print(classification_report(y, pred, target_names=iris.target_names))

Output:

              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        50
  versicolor       0.91      0.98      0.94        50
   virginica       0.98      0.90      0.94        50

    accuracy                           0.96       150
   macro avg       0.96      0.96      0.96       150
weighted avg       0.96      0.96      0.96       150

As we can see from the above that Univariate feature selection has performed better as compare to agglomeratice clustering.

Feature Agglomeration vs Univariate Selection in Scikit Learn

Selecting the most relevant characteristics for a given job is the aim of feature selection, a crucial stage in machine learning. Feature Agglomeration and Univariate Selection are two popular methods for feature selection in Scikit-Learn. These techniques aid in the reduction of dimensionality, increase model effectiveness, and maybe improve model performance.

Feature agglomeration vs. univariate selection using Scikit Learn

1. Import Libraries:

Python3

2. Load the Iris Dataset:

Python3

3. Feature Agglomeration:

Python3

4. Univariate Selection:

Python3

5. Display the Results:

Python3

6. Train the model using both dataset with Agglomerative clustered dataset

Python3

6. Train the model using both dataset with Univariate feature selection dataset

Python3

Feature Agglomeration vs Univariate Selection in Scikit Learn

Categories

Contact US

Feature agglomeration vs. univariate selection using Scikit Learn

1. Import Libraries:

Python3

2. Load the Iris Dataset:

Python3

3. Feature Agglomeration:

Python3

4. Univariate Selection:

Python3

5. Display the Results:

Python3

6. Train the model using both dataset with Agglomerative clustered dataset

Python3

6. Train the model using both dataset with Univariate feature selection dataset

Python3

Feature Agglomeration vs Univariate Selection in Scikit Learn

Similar Reads

Categories

Contact US