ROC Curve in Python
Let’s implement roc curve in python using breast cancer in-built dataset. The breast cancer dataset is a commonly used dataset in machine learning, for binary classification tasks.
Step 1: Importing the required libraries
In scikit-learn, the roc_curve
function is used to compute Receiver Operating Characteristic (ROC) curve points. On the other hand, the auc
function calculates the Area Under the Curve (AUC) from the ROC curve.
AUC is a scalar value representing the area under the ROC curve quantifing the classifier’s ability to distinguish between positive and negative examples across all possible classification thresholds.
Python3
import matplotlib.pyplot as plt from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.metrics import roc_curve, auc |
Step 2: Loading the dataset
Python3
data = load_breast_cancer() X = data.data y = data.target # Split the data into features (X) and target variable (y) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25 , random_state = 42 ) |
Step 3: Training and testing the model
Python3
# Train a logistic regression model model = LogisticRegression() model.fit(X_train, y_train) # Predict probabilities on the test set y_pred_proba = model.predict_proba(X_test)[:, 1 ] |
Step 4: Plot the ROC Curve
- The
roc_curve
function is used to calculate the False Positive Rates (FPR), True Positive Rates (TPR), and corresponding thresholds with true labels and the predicted probabilities of belonging to the positive class as inputs. plt.plot([0, 1], [0, 1], 'k--', label='No Skill')
is used to plot a diagonal dashed line representing a classifier with no discriminative power (random guessing).
Python3
# Calculate ROC curve fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba) roc_auc = auc(fpr, tpr) # Plot the ROC curve plt.figure() plt.plot(fpr, tpr, label = 'ROC curve (area = %0.2f)' % roc_auc) plt.plot([ 0 , 1 ], [ 0 , 1 ], 'k--' , label = 'No Skill' ) plt.xlim([ 0.0 , 1.0 ]) plt.ylim([ 0.0 , 1.05 ]) plt.xlabel( 'False Positive Rate' ) plt.ylabel( 'True Positive Rate' ) plt.title( 'ROC Curve for Breast Cancer Classification' ) plt.legend() plt.show() |
Output:
The dashed line represents the ROC Curve for the classifier. The AUC as 1.00, signifies perfect classification, meaning the model can distinguish malignant from benign tumors flawlessly at any threshold.
How to plot ROC curve in Python
The Receiver Operating Characteristic (ROC) curve is a fundamental tool in the field of machine learning for evaluating the performance of classification models. In this context, we’ll explore the ROC curve and its associated metrics using the breast cancer dataset, a widely used dataset for binary classification tasks.