What are evaluation Metrics?
Evaluation metrics are quantitative measures used to assess the performance of machine learning models. These metrics provide insights into how well a model is performing and can help guide decisions on model selection, parameter tuning, and feature engineering.
- Precision: Measures the proportion of true positive predictions among all positive predictions made by the model.
- Recall: Measures the proportion of true positive predictions among all actual positive instances in the dataset.
- F1 Score: Harmonic mean of precision and recall, providing a balanced measure of a model’s performance.
- ROC AUC: Area under the Receiver Operating Characteristic curve, which illustrates the trade-off between true positive rate and false positive rate for different thresholds of a binary classifier.
Implementation of _______
Dataset Loading
- This code takes cifar 10 dataset
- Splits it into training and testing sets using
train_test_split
, and assigns them to variablesX_train
,X_test
,y_train
, andy_test
. The synthetic data consists of 1000 samples with 20 features and 2 classes. The random_state parameter is set for reproducibility.
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader, Subset
# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# Load CIFAR-10 dataset
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
train_data = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
test_data = datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
# Filtering for binary classification
binary_train_data = Subset(train_data, [i for i in range(len(train_data)) if train_data.targets[i] <= 1])
binary_test_data = Subset(test_data, [i for i in range(len(test_data)) if test_data.targets[i] <= 1])
train_loader = DataLoader(dataset=binary_train_data, batch_size=64, shuffle=True)
test_loader = DataLoader(dataset=binary_test_data, batch_size=64, shuffle=False)
Model Building
Next, we define and train a simple MLP model in PyTorch using the following steps:
- Define the MLP model: The
MLP
class inherits fromnn.Module
, which is the base class for all neural network modules in PyTorch. In the constructor (__init__
), the model is defined with two fully connected (linear) layers (fc1
andfc2
) separated by a ReLU activation function (relu
). - Forward pass: The
forward
method defines how the inputx
is processed through the layers of the network. The inputx
is flattened (view(-1, 3 * 32 * 32)
) to match the input size expected by the first linear layer (fc1
), then passed through the activation function (relu
), and finally processed by the second linear layer (fc2
). - Move model to device: The
model
is moved to the specifieddevice
(e.g., GPU if available) using theto
method. - Define loss and optimizer: The
CrossEntropyLoss
is used as the loss function, which is suitable for multi-class classification problems. TheAdam
optimizer is used to update the model parameters based on the computed gradients. - Training loop: The model is trained for
num_epochs
epochs. In each epoch, the training data (train_loader
) is iterated over in batches. For each batch, the images and labels are loaded to the specifieddevice
. The model is then used to make predictions (outputs
) on the input images, and the loss is computed based on the predicted outputs and actual labels. The optimizer is used to update the model parameters based on the computed gradients (backward
andstep
). - Print epoch and loss: At the end of each epoch, the epoch number and the loss value for that epoch are printed.
# Define MLP model
class MLP(nn.Module):
def __init__(self):
super(MLP, self).__init__()
self.fc1 = nn.Linear(3 * 32 * 32, 512)
self.relu = nn.ReLU()
self.fc2 = nn.Linear(512, 2)
def forward(self, x):
out = self.fc1(x.view(-1, 3 * 32 * 32))
out = self.relu(out)
out = self.fc2(out)
return out
model = MLP().to(device)
# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
# Train the model
num_epochs = 5
for epoch in range(num_epochs):
for i, (images, labels) in enumerate(train_loader):
images, labels = images.to(device), labels.to(device)
# Forward pass
outputs = model(images)
loss = criterion(outputs, labels)
# Backward and optimize
optimizer.zero_grad()
loss.backward()
optimizer.step()
print(f'Epoch {epoch+1}/{num_epochs}, Loss: {loss.item()}')
Calculating Model Metrics: Precision, Recall, F1-score, ROC AUC
- Set model to evaluation mode:
model.eval()
is used to set the model to evaluation mode. This disables any operations like dropout that are only meant to be applied during training. - Iterate over test dataset: The code iterates over the test dataset (
test_loader
) and makes predictions using the trained model for each batch of images. The predicted labels (predicted
) are obtained by taking the maximum value along the second dimension of the output tensor (outputs
) usingtorch.max
. - Convert predictions and labels to lists: The predicted and true labels are converted to lists (
y_pred
andy_true
, respectively) for easier calculation of precision, recall, and F1 score. - Convert lists to tensors: The lists
y_pred
andy_true
are converted back to tensors (y_pred_tensor
andy_true_tensor
, respectively) for further calculation. - Calculate precision, recall, and F1 score: True Positives (TP), False Positives (FP), and False Negatives (FN) are calculated based on the predicted and true labels. Precision, recall, and F1 score are then calculated using these values.
- Print the results: Precision, recall, and F1 score are printed to the console.
This approach to solving a binary classification problem encompasses dataset generation, model definition and training, and evaluation using custom metrics. The use of precision, recall, F1-score, and ROC AUC provides a comprehensive understanding of the model’s performance, beyond what accuracy alone can offer. These metrics are crucial for evaluating the model’s ability to correctly predict positive instances, its overall efficiency in classification, and its trade-offs between different types of errors.
# Evaluate the model using PyTorch
model.eval()
y_true = []
y_pred = []
for images, labels in test_loader:
images = images.to(device)
outputs = model(images)
_, predicted = torch.max(outputs, 1)
y_pred.extend(predicted.cpu().numpy())
y_true.extend(labels.cpu().numpy())
# Convert lists to tensors for calculation
y_true_tensor = torch.tensor(y_true)
y_pred_tensor = torch.tensor(y_pred)
# Calculating precision, recall, and F1 score using PyTorch
TP = ((y_pred_tensor == 1) & (y_true_tensor == 1)).sum().item()
FP = ((y_pred_tensor == 1) & (y_true_tensor == 0)).sum().item()
FN = ((y_pred_tensor == 0) & (y_true_tensor == 1)).sum().item()
precision = TP / (TP + FP) if TP + FP > 0 else 0
recall = TP / (TP + FN) if TP + FN > 0 else 0
f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
print(f'Precision: {precision}')
print(f'Recall: {recall}')
print(f'F1 Score: {f1}')
Output:
Epoch 1/5, Loss: 0.29176822304725647
Epoch 2/5, Loss: 0.43448692560195923
Epoch 3/5, Loss: 0.0890989825129509
Epoch 4/5, Loss: 0.28986942768096924
Epoch 5/5, Loss: 0.3814219832420349
Precision: 0.8680154142581888
Recall: 0.901
F1 Score: 0.8842001962708538
How to calculate the F1 score and other custom metrics in PyTorch?
Evaluating deep learning models goes beyond just training them; it means rigorously checking their performance to ensure they’re accurate, reliable, and efficient for real-world use. This evaluation is critical because it tells us how well a model has learned and how effective it might be in real-life situations. Using custom metrics is essential here, especially when standard metrics like accuracy aren’t enough or when the task needs a simpler explanation. Here, we will see how we can use Pytorch to calculate F1 score and other metrics.