Implementing Feedforward Neural Network for MNIST
For a better understanding, let’s see how to create neural networks in PyTorch. Please be aware that these are only brief samples that you might expand and alter to suit your needs; they are not comprehensive solutions. In this example, handwritten digits from the MNIST dataset are classified using a simple feedforward neural network.
- A straightforward feedforward neural network with two completely connected layers is what we define in this example. When a weight matrix and bias vector are used to link each input and output unit the layer is said to be completely linked.
- The first layer produces 512 features after receiving the flattened picture (28×28 pixels) as input. Ten classes, or the numbers 0 through 9, are produced by the second layer using the 512 characteristics as input.
- To generate the completely linked layers and provide them as network object characteristics, we utilize the nn.Linear class. In order to give the network some non-linearity and aid in its ability to learn intricate patterns. We additionally apply the ReLU activation function to the first layer using the F.relu function.
- The input picture is only flattened in the forward approach which then applies the first layer the ReLU function and the second layer. A tensor of ten logits for each class is the network output.
Step 1: Import the necessary libraries
Python
# Import the necessary libraries import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim from torchvision import datasets, transforms import matplotlib.pyplot as plt import numpy as np |
Step 2 : Define the hyperparameters and transformation
The provided code defines hyperparameters and a transformation to apply to images in a machine learning context. Hyperparameters, including batch_size
, num_epochs
, and learning_rate
, are initialized to control the training process. Additionally, a transformation pipeline, transform
, is defined to preprocess input images. This pipeline employs two sequential transformations: transforms.ToTensor()
converts the images into PyTorch tensors, a requisite format for neural network computations, while transforms.Normalize()
standardizes the pixel values by subtracting the mean (0.1307) and dividing by the standard deviation (0.3081).
Python
# Define the hyperparameters batch_size = 64 # The number of samples per batch num_epochs = 10 # The number of times to iterate over the whole dataset learning_rate = 0.01 # The learning rate for the optimizer # Define the transformation to apply to the images transform = transforms.Compose([ transforms.ToTensor(), # Convert the images to tensors transforms.Normalize(( 0.1307 ,), ( 0.3081 ,)) # Normalize the pixel values with mean and std ]) |
Step 3 : Load and prepare the dataset
The provided code loads the MNIST dataset from the web, consisting of handwritten digit images and their corresponding labels. It initializes two datasets: train_dataset for training data and test_dataset for testing data. Both datasets are configured with transformations defined earlier, enabling image tensor conversion and pixel value normalization. Subsequently, data loaders, train_loader and test_loader, are created to facilitate batching and shuffling of data during training and testing phases, respectively.
Python
# Load the MNIST dataset from the web train_dataset = datasets.MNIST(root = '.' , train = True , download = True , transform = transform) # The training set test_dataset = datasets.MNIST(root = '.' , train = False , download = True , transform = transform) # The test set # Create the data loaders for batching and shuffling the data train_loader = torch.utils.data.DataLoader(train_dataset, batch_size = batch_size, shuffle = True ) # The training loader test_loader = torch.utils.data.DataLoader(test_dataset, batch_size = batch_size, shuffle = False ) # The test loader |
Output:
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./MNIST/raw/train-images-idx3-ubyte.gz
100%|██████████| 9912422/9912422 [00:00<00:00, 78077039.54it/s]
Extracting ./MNIST/raw/train-images-idx3-ubyte.gz to ./MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./MNIST/raw/train-labels-idx1-ubyte.gz
100%|██████████| 28881/28881 [00:00<00:00, 65021843.17it/s]Extracting ./MNIST/raw/train-labels-idx1-ubyte.gz to ./MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./MNIST/raw/t10k-images-idx3-ubyte.gz
100%|██████████| 1648877/1648877 [00:00<00:00, 22545472.73it/s]
Extracting ./MNIST/raw/t10k-images-idx3-ubyte.gz to ./MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./MNIST/raw/t10k-labels-idx1-ubyte.gz
100%|██████████| 4542/4542 [00:00<00:00, 12298598.30it/s]Extracting ./MNIST/raw/t10k-labels-idx1-ubyte.gz to ./MNIST/raw
Step 4 : Define the neural network model
We define a simple neural network class Net
using PyTorch’s nn.Module
. The network consists of two fully connected layers (fc1
and fc2
). Here’s a breakdown of the code:
__init__(self)
: This is the constructor method where the network architecture is defined. It initializes two fully connected layers usingnn.Linear
. The first layer (fc1
) takes an input of size28*28
(assuming the input images are 28×28 pixels and flattened into a vector) and outputs512
features. The second layer (fc2
) takes the512
features from the first layer as input and outputs10
classes (assuming this is a classification task with 10 classes).forward(self, x)
: This method defines the forward pass of the network. It takes an input tensorx
(representing an image batch) and performs the following operations:- Flattens the input tensor into a vector using
x.view(-1, 28*28)
. - Passes the flattened input through the first fully connected layer (
fc1
) and applies the ReLU activation function usingF.relu(self.fc1(x))
. - Passes the output of the first layer through the second fully connected layer (
fc2
) to get the final output logits.
- Flattens the input tensor into a vector using
Python
# Define the neural network model class Net(nn.Module): def __init__( self ): super (Net, self ).__init__() # The network has two fully connected layers self .fc1 = nn.Linear( 28 * 28 , 512 ) # The first layer takes the flattened image as input and outputs 512 features self .fc2 = nn.Linear( 512 , 10 ) # The second layer takes the 512 features as input and outputs 10 classes def forward( self , x): # The forward pass of the network x = x.view( - 1 , 28 * 28 ) # Flatten the image into a vector x = F.relu( self .fc1(x)) # Apply the ReLU activation function to the first layer x = self .fc2(x) # Apply the second layer return x # Return the output logits |
Step 5 : Define the loss function, the optimizer and instance of the model
The provided code segment initializes the neural network model, moves it to the available device (either CPU or GPU), and defines the loss function along with the optimizer.
Python
# Create an instance of the model and move it to the device (CPU or GPU) device = torch.device( "cuda" if torch.cuda.is_available() else "cpu" ) # Get the device model = Net().to(device) # Move the model to the device print (model) # Print the model summary # Define the loss function and the optimizer criterion = nn.CrossEntropyLoss() # The cross entropy loss for multi-class classification optimizer = optim.SGD(model.parameters(), lr = learning_rate) # The stochastic gradient descent optimizer # Define a function to calculate the accuracy of the model def accuracy(outputs, labels): # The accuracy is the percentage of correct predictions _, preds = torch. max (outputs, 1 ) # Get the predicted classes from the output logits return torch. sum (preds = = labels).item() / len (labels) # Return the ratio of correct predictions |
Output:
Net(
(fc1): Linear(in_features=784, out_features=512, bias=True)
(fc2): Linear(in_features=512, out_features=10, bias=True)
)
Step 6 : Define the training and test loop
train(model, device, train_loader, criterion, optimizer, epoch)
: This function trains the model using the training data. It sets the model to training mode, loops over batches of data from thetrain_loader
, moves the inputs and labels to the specifieddevice
, performs a forward pass through the model to get the output logits, calculates the loss using the specifiedcriterion
, performs a backward pass to compute gradients, and updates the model parameters using the specifiedoptimizer
. It also prints the average loss and accuracy over the batches.test(model, device, test_loader, criterion)
: This function evaluates the model using the test data. It sets the model to evaluation mode, loops over batches of data from thetest_loader
, moves the inputs and labels to the specifieddevice
, performs a forward pass through the model to get the output logits, calculates the loss using the specifiedcriterion
, and prints the average loss and accuracy over the batches.
Python
# Define the training loop def train(model, device, train_loader, criterion, optimizer, epoch): # Set the model to training mode model.train() # Initialize the running loss and accuracy running_loss = 0.0 running_acc = 0.0 # Loop over the batches of data for i, (inputs, labels) in enumerate (train_loader): # Move the inputs and labels to the device inputs = inputs.to(device) labels = labels.to(device) # Zero the parameter gradients optimizer.zero_grad() # Forward pass outputs = model(inputs) # Get the output logits from the model loss = criterion(outputs, labels) # Calculate the loss # Backward pass and optimize loss.backward() # Compute the gradients optimizer.step() # Update the parameters # Print the statistics running_loss + = loss.item() # Accumulate the loss running_acc + = accuracy(outputs, labels) # Accumulate the accuracy if (i + 1 ) % 200 = = 0 : # Print every 200 batches print (f 'Epoch {epoch}, Batch {i + 1}, Loss: {running_loss / 200:.4f}, Accuracy: {running_acc / 200:.4f}' ) running_loss = 0.0 running_acc = 0.0 # Define the test loop def test(model, device, test_loader, criterion): # Set the model to evaluation mode model. eval () # Initialize the loss and accuracy test_loss = 0.0 test_acc = 0.0 # Loop over the batches of data with torch.no_grad(): # No need to track the gradients for inputs, labels in test_loader: # Move the inputs and labels to the device inputs = inputs.to(device) labels = labels.to(device) # Forward pass outputs = model(inputs) # Get the output logits from the model loss = criterion(outputs, labels) # Calculate the loss # Print the statistics test_loss + = loss.item() # Accumulate the loss test_acc + = accuracy(outputs, labels) # Accumulate the accuracy # Print the average loss and accuracy print (f 'Test Loss: {test_loss / len(test_loader):.4f}, Test Accuracy: {test_acc / len(test_loader):.4f}' ) |
Step 7 : Train and test the model along Visualize some sample images and predictions
This code segment trains and tests the model for the specified number of epochs and then visualizes some sample images along with their predictions.
Python
# Train and test the model for the specified number of epochs for epoch in range ( 1 , num_epochs + 1 ): train(model, device, train_loader, criterion, optimizer, epoch) # Train the model test(model, device, test_loader, criterion) # Test the model # Visualize some sample images and predictions samples, labels = next ( iter (test_loader)) # Get a batch of test data samples = samples.to(device) # Move the samples to the device outputs = model(samples) # Get the output logits from the model _, preds = torch. max (outputs, 1 ) # Get the predicted classes from the output logits samples = samples.cpu().numpy() # Move the samples back to CPU and convert to numpy array fig, axes = plt.subplots( 3 , 3 , figsize = ( 8 , 8 )) # Create a 3x3 grid of subplots for i, ax in enumerate (axes.ravel()): ax.imshow(samples[i].squeeze(), cmap = 'gray' ) # Plot the image ax.set_title(f 'Label: {labels[i]}, Prediction: {preds[i]}' ) # Set the title ax.axis( 'off' ) # Hide the axes plt.tight_layout() # Adjust the spacing plt.show() # Show the plot |
Output:
Epoch 1, Batch 200, Loss: 1.1144, Accuracy: 0.7486
Epoch 1, Batch 400, Loss: 0.4952, Accuracy: 0.8739
Epoch 1, Batch 600, Loss: 0.3917, Accuracy: 0.8903
Epoch 1, Batch 800, Loss: 0.3515, Accuracy: 0.9042
Test Loss: 0.3018, Test Accuracy: 0.9155
Epoch 2, Batch 200, Loss: 0.3067, Accuracy: 0.9123
Epoch 2, Batch 400, Loss: 0.2929, Accuracy: 0.9168
Epoch 2, Batch 600, Loss: 0.2878, Accuracy: 0.9185
Epoch 2, Batch 800, Loss: 0.2735, Accuracy: 0.9210
Test Loss: 0.2471, Test Accuracy: 0.9314
Epoch 3, Batch 200, Loss: 0.2580, Accuracy: 0.9256
Epoch 3, Batch 400, Loss: 0.2442, Accuracy: 0.9301
Epoch 3, Batch 600, Loss: 0.2354, Accuracy: 0.9338
Epoch 3, Batch 800, Loss: 0.2281, Accuracy: 0.9359
Test Loss: 0.2130, Test Accuracy: 0.9403
Epoch 4, Batch 200, Loss: 0.2149, Accuracy: 0.9403
Epoch 4, Batch 400, Loss: 0.2055, Accuracy: 0.9441
Epoch 4, Batch 600, Loss: 0.2050, Accuracy: 0.9395
Epoch 4, Batch 800, Loss: 0.2018, Accuracy: 0.9425
Test Loss: 0.1860, Test Accuracy: 0.9465
Epoch 5, Batch 200, Loss: 0.1925, Accuracy: 0.9464
Epoch 5, Batch 400, Loss: 0.1850, Accuracy: 0.9473
Epoch 5, Batch 600, Loss: 0.1813, Accuracy: 0.9481
Epoch 5, Batch 800, Loss: 0.1753, Accuracy: 0.9503
Test Loss: 0.1691, Test Accuracy: 0.9517
Epoch 6, Batch 200, Loss: 0.1719, Accuracy: 0.9521
Epoch 6, Batch 400, Loss: 0.1599, Accuracy: 0.9557
Epoch 6, Batch 600, Loss: 0.1627, Accuracy: 0.9521
Epoch 6, Batch 800, Loss: 0.1567, Accuracy: 0.9562
Test Loss: 0.1549, Test Accuracy: 0.9547
Epoch 7, Batch 200, Loss: 0.1441, Accuracy: 0.9620
Epoch 7, Batch 400, Loss: 0.1474, Accuracy: 0.9587
Epoch 7, Batch 600, Loss: 0.1447, Accuracy: 0.9601
Epoch 7, Batch 800, Loss: 0.1426, Accuracy: 0.9580
Test Loss: 0.1404, Test Accuracy: 0.9602
Epoch 8, Batch 200, Loss: 0.1360, Accuracy: 0.9627
Epoch 8, Batch 400, Loss: 0.1359, Accuracy: 0.9620
Epoch 8, Batch 600, Loss: 0.1304, Accuracy: 0.9631
Epoch 8, Batch 800, Loss: 0.1322, Accuracy: 0.9634
Test Loss: 0.1308, Test Accuracy: 0.9624
Epoch 9, Batch 200, Loss: 0.1152, Accuracy: 0.9690
Epoch 9, Batch 400, Loss: 0.1188, Accuracy: 0.9674
Epoch 9, Batch 600, Loss: 0.1303, Accuracy: 0.9637
Epoch 9, Batch 800, Loss: 0.1236, Accuracy: 0.9645
Test Loss: 0.1234, Test Accuracy: 0.9633
Epoch 10, Batch 200, Loss: 0.1112, Accuracy: 0.9679
Epoch 10, Batch 400, Loss: 0.1120, Accuracy: 0.9707
Epoch 10, Batch 600, Loss: 0.1158, Accuracy: 0.9681
Epoch 10, Batch 800, Loss: 0.1138, Accuracy: 0.9688
Test Loss: 0.1145, Test Accuracy: 0.9665
How to implement neural networks in PyTorch?
Neural networks can be created and trained in Python with the help of the well-known open-source PyTorch framework. This tutorial will teach you how to use PyTorch to create a basic neural network and classify handwritten numbers from the MNIST dataset.
Modern artificial intelligence relies on neural networks, which give machines the ability to learn and make judgments that are akin to those made by humans. Regression, classification and creation are just a few of the tasks that neural networks, as computer models, may do after learning from input. The popular open-source PyTorch framework may be used to design and train neural networks in Python. In this tutorial, you will learn how to use PyTorch to classify handwritten numbers from the MNIST dataset using a rudimentary neural network.