Customizing Optimizers
There are many ways to customize optimizers in PyTorch, Some of them are as follows:
Changing the learning rate schedule:
The learning rate of the optimizer can be changed during training using a learning rate scheduler. PyTorch provides several built-in schedulers such as torch.optim.lr_scheduler.StepLR and torch.optim.lr_scheduler.ExponentialLR. We can also create our own scheduler by inheriting from the torch.optim.lr_scheduler._LRScheduler class.
In below code, we are using the torch.optim.lr_scheduler.StepLR scheduler which will multiply the learning rate by a factor of gamma every step_size iterations.
Python3
# Initialize an optimizer with a fixed learning rate optimizer = torch.optim.SGD(model.parameters(), lr = 0.01 ) # Create a learning rate scheduler scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size = 10 , gamma = 0.1 ) num_epochs = 200 # In the training loop for i in range (num_epochs): # Perform the training step optimizer.zero_grad() y_pred = model(X) loss = criterion(y_pred, y) loss.backward() optimizer.step() # Update the learning rate scheduler.step() |
Adding regularization
To add regularization to the optimizer, we can modify the step() method to include the regularization term in the update of the model parameters. For example, we can add L1 or L2 regularization by modifying the step() method to include a term that penalizes the absolute or squared values of the parameters respectively.
Python3
# Define custom optimizer class MyAdam(torch.optim.Adam): def __init__( self , params, lr = 1e - 3 , betas = ( 0.9 , 0.999 ), weight_decay = 0 ): super ().__init__(params, lr = lr, betas = betas) self .weight_decay = weight_decay def step( self ): for group in self .param_groups: for p in group[ 'params' ]: if p.grad is None : continue grad = p.grad.data if grad.is_sparse: raise RuntimeError( "Adam does not support sparse gradients" ) state = self .state[p] # State initialization if len (state) = = 0 : state[ "step" ] = 0 # Exponential moving average of gradient values state[ "exp_avg" ] = torch.zeros_like(p.data) # Exponential moving average of squared gradient values state[ "exp_avg_sq" ] = torch.zeros_like(p.data) exp_avg, exp_avg_sq = state[ "exp_avg" ], state[ "exp_avg_sq" ] beta1, beta2 = group[ "betas" ] state[ "step" ] + = 1 if self .weight_decay ! = 0 : grad = grad.add(p.data, alpha = self .weight_decay) # Decay the first and second moment running average coefficient exp_avg.mul_(beta1).add_( 1 - beta1, grad) exp_avg_sq.mul_(beta2).addcmul_( 1 - beta2, grad, grad) denom = exp_avg_sq.sqrt().add_(group[ "eps" ]) bias_correction1 = 1 - beta1 * * state[ "step" ] bias_correction2 = 1 - beta2 * * state[ "step" ] step_size = group[ "lr" ] * math.sqrt(bias_correction2) / bias_correction1 p.data.addcdiv_( - step_size, exp_avg, denom) # Optimizer optimizer = MyAdam(model.parameters(), weight_decay = 0.00002 ) |
In the above code, we are creating a custom Adam optimizer that includes weight decay regularization by adding a weight_decay parameter to the optimizer, and modifying the step() method to include the weight decay term in the update of the parameters. The weight decay term is applied to the gradients by grad = grad.add(p.data, alpha=group[“weight_decay”]) , this will penalize large parameter values by decreasing their update.
Implementing a new optimization algorithm:
PyTorch provides several built-in optimization algorithms, such as SGD, Adam, and Adagrad. However, there are many other optimization algorithms that are not included in the library. By creating a custom optimizer, we can implement any optimization algorithm that we want.
Python3
class MyOptimizer(torch.optim.Optimizer): def __init__( self , params, lr = 0.01 ): defaults = dict (lr = lr) super (MyOptimizer, self ).__init__(params, defaults) def step( self ): for group in self .param_groups: for p in group[ 'params' ]: if p.grad is None : continue p.data = p.data - group[ 'lr' ] * p.grad.data * * 2 optimizer = MyOptimizer(model.parameters(), lr = 0.001 ) |
In this example, we created a new optimization algorithm called MyOptimizer, that performs updates to the parameters based on the squared gradient values, instead of the gradients themselves.
Using multiple optimizers:
In some cases, we may want to use different optimizers for different parts of the model. For example, we may want to use Adam for the parameters of the convolutional layers, and SGD for the parameters of the fully-connected layers. This can be achieved by creating multiple instances of the optimizer, one for each set of parameters.
Python3
# Define different optimizers for different parts of the model params1 = model.conv_layers.parameters() params2 = model.fc_layers.parameters() optimizer1 = torch.optim.Adam(params1) optimizer2 = torch.optim.SGD(params2, lr = 0.01 ) # In the training loop for i in range (num_epochs): # Perform the training step ... optimizer1.zero_grad() optimizer2.zero_grad() loss.backward() optimizer1.step() optimizer2.step() |
In this example, we are using Adam optimizer for the parameters of the convolutional layers, and SGD optimizer with a fixed learning rate of 0.01 for the parameters of the fully-connected layers. This can help fine-tune the training of specific parts of the model.
Custom Optimizers in Pytorch
In PyTorch, an optimizer is a specific implementation of the optimization algorithm that is used to update the parameters of a neural network. The optimizer updates the parameters in such a way that the loss of the neural network is minimized. PyTorch provides various built-in optimizers such as SGD, Adam, Adagrad, etc. that can be used out of the box. However, in some cases, the built-in optimizers may not be suitable for a particular problem or may not perform well. In such cases, one can create their own custom optimizer.
A custom optimizer in PyTorch is a class that inherits from the torch.optim.Optimizer base class. The custom optimizer should implement the init and step methods. The init method is used to initialize the optimizer’s internal state, and the step method is used to update the parameters of the model.