Train the model
I’ll use the initialization method and show you how to train a small neural network on a custom-defined dataset using this weight initialization methodology. In this example, we’ll generate a fictitious dataset for binary classification and train a neural network to complete the task.
Here’s a step-by-step implementation using PyTorch:
We constructed a new binary classification dataset, designed a basic neural network using He initialization, and trained the model utilizing stochastic gradient descent (SGD) as the optimizer in this example. We will import same libraries, as discussed before.
Generating Dataset
We have define generate_dataset function to generate random 2D features and assign labels to a custom defined dataset for binary classification using Pytorch framework.
Python3
# Create a custom-defined dataset def generate_dataset(num_samples = 100 ): np.random.seed( 0 ) features = np.random.rand(num_samples, 2 ) # Random 2D features labels = (features[:, 0 ] + features[:, 1 ] > 1 ).astype( int ) # Binary classification return torch.tensor(features, dtype = torch.float32), torch.tensor(labels, dtype = torch.float32) |
Defining the Neural network with He Initialization
Here, we have define a class SimpleClassifier with two fully connected layers. We have applied He initialization to the weights of these layers.
Python3
# Define the neural network with He initialization class SimpleClassifier(nn.Module): def __init__( self ): super (SimpleClassifier, self ).__init__() self .fc1 = nn.Linear( 2 , 64 ) self .fc2 = nn.Linear( 64 , 1 ) # Apply He initialization to the layers nn.init.kaiming_normal_( self .fc1.weight) nn.init.kaiming_normal_( self .fc2.weight) def forward( self , x): x = torch.relu( self .fc1(x)) x = torch.sigmoid( self .fc2(x)) return x |
Setting Hyperparameters
We have set hyperparameters for the model. Setting hyperparameters is an important step in ML workflow. It allows you to fine-tune the model.
Python3
# Hyperparameters learning_rate = 0.01 epochs = 1000 batch_size = 16 |
To explore deeper and tailor it to your individual needs, feel free to change the hyperparameters, dataset size, or model architecture.
Creating Dataset and Dataloader
In this code, we have defined the batch_size that determines number of samples are processed in each iteration. We have created DataLoader that allows to iterate through the dataset in batches.
Python3
# Create the dataset and dataloader features, labels = generate_dataset() dataset = torch.utils.data.TensorDataset(features, labels) dataloader = torch.utils.data.DataLoader( dataset, batch_size = batch_size, shuffle = True ) |
Initializing model and optimizer
Here, we have initialize a simple neural network classifier model, an optimizer and binary cross-entropy loss function using Pytorch framework. The optimizer set for the training is Stochastic Gradient Descent (SGD) as the optimizer algorithm. The learning rate determine the rate of updating weights.
As it is a binary classification model, the loss function is set to binary cross-entropy to measure the dissimilarity between the predicted probabilities and actual binary labels.
Python3
# Initialize the model and optimizer model = SimpleClassifier() optimizer = optim.SGD(model.parameters(), lr = learning_rate) criterion = nn.BCELoss() # Binary Cross-Entropy Loss |
After setting up these components, let’s proceed to train your model.
Training Loop
In this code snippet, we design the loop to train the model for specified epochs using DataLoader to iterate through the dataset.
- optimizer.zero_grad() : used to reset gradient of the model’s parameters to 0 at the starting of each batch
- model(inputs): generate predictions on the input data
- optimizer.zero_grad() : calculate binary cross-entropy loss
- loss.backward() : computes loss using backpropagation
- optimizer.step() : updates the weights
- loss.item() : keeps the track of the cummulative loss for the entire epoch
Python3
# Training loop for epoch in range (epochs): total_loss = 0 for inputs, targets in dataloader: optimizer.zero_grad() # Zero the gradients outputs = model(inputs) # Forward pass loss = criterion(outputs, targets.view( - 1 , 1 )) # Calculate the loss loss.backward() # Backpropagation optimizer.step() # Update weights total_loss + = loss.item() # Print the average loss for this epoch if (epoch + 1 ) % 100 = = 0 : average_loss = total_loss / len (dataloader) print (f "Epoch [{epoch + 1}/{epochs}] - Loss: {average_loss:.4f}" ) |
Output:
Epoch [100/1000] - Loss: 0.4184
Epoch [200/1000] - Loss: 0.2807
Epoch [300/1000] - Loss: 0.2209
Epoch [400/1000] - Loss: 0.1875
Epoch [500/1000] - Loss: 0.1531
Epoch [600/1000] - Loss: 0.1704
Epoch [700/1000] - Loss: 0.1382
Epoch [800/1000] - Loss: 0.1160
Epoch [900/1000] - Loss: 0.1246
Epoch [1000/1000] - Loss: 0.1028
Evaluation:
In the last, we evaluate the performance of the model on test dataset and print accuracy, which gives an idea of how well the model is performing on unseen data.
- model.eval() : set the model to evaluation mode
- with torch.no_grad() : temporarily disables gradient computation
- predictions = model(test_samples).round().squeeze().numpy() : pass test samples through trained model to get predictions
- model(test_sample) : forward pass the test sample
- .round() : rounds the raw predictions to 0 or 1
- .squeeze() : remove unnecessary dimensions
- .numpy() : converts prediction from pytorch tensor to numpy array
Finally, we print the accuracy.
Python3
# Evaluate the trained model model. eval () with torch.no_grad(): test_samples, test_labels = generate_dataset(num_samples = 20 ) predictions = model(test_samples). round ().squeeze().numpy() accuracy = (predictions = = test_labels.numpy()).mean() print (f "Test Accuracy: {accuracy * 100:.2f}%" ) |
Output:
Test Accuracy: 100.00%
Selecting weight initialization depends on the activation function, network architecture and nature of the problem. A recommended approach is to try out various weight initialization techniques and closely observe the training process, including metrics such as training loss and convergence speed. This way, you can identify the most suitable initialization method for your particular problem.
Choosing the appropriate weight initialization technique is an important step in creating successful deep neural networks using PyTorch. It can have a considerable influence on your model’s convergence speed and overall performance. We may make an educated selection regarding the weight initialization technique to utilize by taking into account parameters like as activation functions, network depth, and the existence of batch normalization. Remember that testing and fine-tuning are essential for determining the best weight initialization for your particular scenario. With PyTorch’s strength at your disposal, you can build DNNs that learn effectively and provide astounding outcomes.
Select the right Weight for deep Neural Network in Pytorch
PyTorch has developed a strong and adaptable framework for creating deep neural networks (DNNs) in the field of deep learning. Choosing the proper weight for your model is an important component in designing an efficient DNN. Initialization of weights is critical in deciding how successfully your neural network will learn from input and converge to a suitable answer. In this post, we will discuss the significance of weight initialization and give advice for choosing the appropriate weight for your deep neural network in PyTorch.
Table of Content
- Understanding the Significance of Weight Initialization
- The Role of PyTorch in Weight Initialization
- Guidelines for Selecting the Right Weight Initialization
- Implementation in PyTorch
- Train the model