ROC Curve in Python

Let’s implement roc curve in python using breast cancer in-built dataset. The breast cancer dataset is a commonly used dataset in machine learning, for binary classification tasks.

Step 1: Importing the required libraries

In scikit-learn, the roc_curve function is used to compute Receiver Operating Characteristic (ROC) curve points. On the other hand, the auc function calculates the Area Under the Curve (AUC) from the ROC curve.

AUC is a scalar value representing the area under the ROC curve quantifing the classifier’s ability to distinguish between positive and negative examples across all possible classification thresholds.

Python3




import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_curve, auc


Step 2: Loading the dataset

Python3




data = load_breast_cancer()
X = data.data
y = data.target # Split the data into features (X) and target variable (y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)


Step 3: Training and testing the model

Python3




# Train a logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)
# Predict probabilities on the test set
y_pred_proba = model.predict_proba(X_test)[:, 1]


Step 4: Plot the ROC Curve

  • The roc_curve function is used to calculate the False Positive Rates (FPR), True Positive Rates (TPR), and corresponding thresholds with true labels and the predicted probabilities of belonging to the positive class as inputs.
  • plt.plot([0, 1], [0, 1], 'k--', label='No Skill') is used to plot a diagonal dashed line representing a classifier with no discriminative power (random guessing).

Python3




# Calculate ROC curve
fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba)
roc_auc = auc(fpr, tpr)
# Plot the ROC curve
plt.figure() 
plt.plot(fpr, tpr, label='ROC curve (area = %0.2f)' % roc_auc)
plt.plot([0, 1], [0, 1], 'k--', label='No Skill')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve for Breast Cancer Classification')
plt.legend()
plt.show()


Output:

The dashed line represents the ROC Curve for the classifier. The AUC as 1.00, signifies perfect classification, meaning the model can distinguish malignant from benign tumors flawlessly at any threshold.

How to plot ROC curve in Python

The Receiver Operating Characteristic (ROC) curve is a fundamental tool in the field of machine learning for evaluating the performance of classification models. In this context, we’ll explore the ROC curve and its associated metrics using the breast cancer dataset, a widely used dataset for binary classification tasks.

Similar Reads

What is the ROC Curve?

The ROC curve stands for Receiver Operating Characteristics Curve and is an evaluation metric for classification tasks and it is a probability curve that plots sensitivity and specificity. So, we can say that the ROC Curve can also be defined as the evaluation metric that plots the sensitivity against the false positive rate. The ROC curve plots two different parameters given below:...

Types of ROC Curve

There are two types of ROC Curves:...

ROC Curve in Python

Let’s implement roc curve in python using breast cancer in-built dataset. The breast cancer dataset is a commonly used dataset in machine learning, for binary classification tasks....

How Ideal Curve looks like?

...

Advantages of ROC Curve

...

Disadvantages of ROC Curve

...