In this implementation, we have set up to use a Voting Classifier with a Support Vector Machine (SVM) and a Decision Tree (DT) as base estimators for the breast cancer dataset.
Importing Necessary Libraries
Python3
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.ensemble import VotingClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
|
Loading and splitting the dataset
Python3
breast_cancer = load_breast_cancer()
X_bc, y_bc = breast_cancer.data, breast_cancer.target
X_train_bc, X_test_bc, y_train_bc, y_test_bc = train_test_split(X_bc, y_bc, test_size = 0.2 , random_state = 42 )
|
Creating Base Estimators
- SVC (Support Vector Classifier): The
probability=True
parameter allows the model to predict probabilities for each class, which is necessary for soft voting in the VotingClassifier
.
- DecisionTreeClassifier: This classifier creates a model that predicts the value of a target variable by learning simple decision rules inferred from the data features. Each internal node represents a “test” on an attribute, each branch represents the outcome of the test, and each leaf node represents a class label. In the context of the
VotingClassifier
, the decision tree serves as another base estimator for voting.
Python3
svm_bc = SVC(probability = True )
dt_bc = DecisionTreeClassifier()
|
Ensemble Learning
- VotingClassifier creation: The
VotingClassifier
is created with estimators=[('svm', svm_bc), ('dt', dt_bc)]
, specifying the list of base estimators to be used for the voting. The voting='soft'
parameter indicates that the classifier will use soft voting, which means it predicts the class label based on the argmax of the sums of the predicted probabilities.
- Training the voting classifier: The
fit
method is called on the voting_clf_bc
object with the training data X_train_bc
and y_train_bc
to train the classifier on the breast cancer dataset.
Python3
voting_clf_bc = VotingClassifier(estimators = [( 'svm' , svm_bc), ( 'dt' , dt_bc)], voting = 'soft' )
voting_clf_bc.fit(X_train_bc, y_train_bc)
|
Evaluation of the Model
- Making predictions: The
predict
method is called on the voting_clf_bc
object with the test data X_test_bc
to make predictions for the breast cancer dataset.
- Evaluating accuracy: The
accuracy_score
function is used to compare the predicted labels y_pred_bc
with the actual labels y_test_bc
from the test set. The accuracy is then printed to the console using f-string formatting.
Python3
y_pred_bc = voting_clf_bc.predict(X_test_bc)
accuracy_bc = accuracy_score(y_test_bc, y_pred_bc)
print (f 'Accuracy on breast cancer dataset: {accuracy_bc}' )
|
Accuracy on breast cancer dataset: 0.9385964912280702
Ensemble Learning with SVM and Decision Trees
Ensemble learning is a machine learning technique that combines multiple individual models to improve predictive performance. Two popular algorithms used in ensemble learning are Support Vector Machines (SVMs) and Decision Trees.