How the Gradient Descent Algorithm Works

For the sake of complexity, we can write our loss function for the single row as below


									
									
								

In the above function x and y are our input data i.e constant. To find the optimal value of weight w and bias b. we partially differentiate with respect to w and b. This is also said that we will find the gradient of loss function J(w,b) with respect to w and b to find the optimal value of w and b.

Gradient of J(w,b) with respect to w

i.e 

Gradient of J(w,b) with respect to b

i.e

Here we have considered the linear regression. So that here the parameters are weight and bias only. But in a fully connected neural network model there can be multiple layers and multiple parameters.  but the concept will be the same everywhere. And the below-mentioned formula will work everywhere.


									
									
								

Here, 

  •   = Learning rate 
  • J = Loss function
  •  = Gradient symbol denotes the derivative of loss function J 
  • Param = weight and bias     There can be multiple weight and bias values depending upon the complexity of the model and features in the dataset

In our case: 


									
									
								

In the current problem, two input features, So, the weight will be two.

Implementations of the Gradient Descent algorithm for the above model

Steps: 

  1.  Find the gradient using loss.backward()
  2. Get the parameter using model.linear.weight and model.linear.bias 
  3. Update the parameter using the above-defined equation.
  4. Again assign the model parameter to our model
# Find the gradient using 
loss.backward()
# Learning Rate
learning_rate = 0.001
# Model Parameter
w = model.linear.weight
b = model.linear.bias
# Matually Update the model parameter
w = w - learning_rate * w.grad
b = b - learning_rate * b.grad
# assign the weight & bias parameter to the linear layer
model.linear.weight = nn.Parameter(w)
model.linear.bias   = nn.Parameter(b)

Now Repeat this process till 1000 epochs

Python3

# Number of epochs
num_epochs = 1000
 
# Learning Rate
learning_rate = 0.01
 
# SUBPLOT WEIGHT & BIAS VS lOSSES
fig, (ax1, ax2) = plt.subplots(1, 2, sharey=True)
 
for epoch in range(num_epochs):
    # Forward pass
    y_p = model(x)
    loss = Mean_Squared_Error(y_p, y)
     
    # Backproogation
    # Find the fradient using
    loss.backward()
 
    # Learning Rate
    learning_rate = 0.001
 
    # Model Parameter
    w = model.linear.weight
    b = model.linear.bias
 
    # Matually Update the model parameter
    w = w - learning_rate * w.grad
    b = b - learning_rate * b.grad
 
    # assign the weight & bias parameter to the linear layer
    model.linear.weight = nn.Parameter(w)
    model.linear.bias   = nn.Parameter(b)
             
    if (epoch+1) % 100 == 0:
        ax1.plot(w.detach().numpy(),loss.item(),'r*-')
        ax2.plot(b.detach().numpy(),loss.item(),'g+-')
        print('Epoch [{}/{}], weight:{}, bias:{} Loss: {:.4f}'.format(
            epoch+1,num_epochs,
            w.detach().numpy(),
            b.detach().numpy(),
            loss.item()))
         
ax1.set_xlabel('weight')
ax2.set_xlabel('bias')
ax1.set_ylabel('Loss')
ax2.set_ylabel('Loss')
plt.show()

                    

Output:

Epoch [100/1000], weight:[[-0.2618025   0.44433367]], bias:[-0.17722966] Loss: 14.1803
Epoch [200/1000], weight:[[-0.21144074  0.35393423]], bias:[-0.7892358] Loss: 10.3030
Epoch [300/1000], weight:[[-0.17063744  0.28172654]], bias:[-1.2897989] Loss: 7.7120
Epoch [400/1000], weight:[[-0.13759881  0.22408141]], bias:[-1.699218] Loss: 5.9806
Epoch [500/1000], weight:[[-0.11086453  0.17808875]], bias:[-2.0340943] Loss: 4.8235
Epoch [600/1000], weight:[[-0.08924612  0.14141548]], bias:[-2.3080034] Loss: 4.0502
Epoch [700/1000], weight:[[-0.0717768   0.11219224]], bias:[-2.5320508] Loss: 3.5333
Epoch [800/1000], weight:[[-0.0576706   0.08892148]], bias:[-2.7153134] Loss: 3.1878
Epoch [900/1000], weight:[[-0.04628877  0.07040432]], bias:[-2.8652208] Loss: 2.9569
Epoch [1000/1000], weight:[[-0.0371125   0.05568104]], bias:[-2.9878428] Loss: 2.8026

Weight & Bias vs Losses – w3wiki

From the above graph and data, we can observe the Losses are decreasing as per the weight and bias variations.

Now we have found the optimal weight and bias values. Print the optimal weight and bias and 

Python3

w = model.linear.weight
b = model.linear.bias
 
print('weight(W) = {} \n  bias(b) = {}'.format(
  w.abs(),
  b.abs()))

                    

Output:

weight(W) = tensor([[0.0371, 0.0557]], grad_fn=<AbsBackward0>) 
  bias(b) = tensor([2.9878], grad_fn=<AbsBackward0>)

Prediction

Python3

pred =  x @ w.T + b
pred[:5]

                    

Output:

tensor([[-2.9765],
        [-3.1385],
        [-3.0818],
        [-3.0756],
        [-2.8681]], grad_fn=<SliceBackward0>)

Gradient Descent Algorithm in Machine Learning

Think about how a machine learns from the data in machine learning and deep learning during training.  This involves a large amount of data.

Through the lens of this article, we will delve into the intricacies of minimizing the cost function, a pivotal task in training models.

Table of Content

  • Gradient Descent in Machine Learning
  • Gradient Descent Python Implementation
  • How the Gradient Descent Algorithm Works
  • Gradient Descent Learning Rate
  • Vanishing and Exploding Gradients
  • Different Variants of Gradient Descent
  • Advantages & Disadvantages of gradient descent
  • Gradient Descent In Machine Learning-FAQs

Similar Reads

Gradient Descent in Machine Learning

What is Gradient?...

Gradient Descent Python Implementation

Diving further into the concept, let’s understand in depth, with practical implementation....

How the Gradient Descent Algorithm Works

...

Gradient Descent Learning Rate

...

Vanishing and Exploding Gradients

...

Different Variants of Gradient Descent

...

Advantages & Disadvantages of gradient descent

...

Conclusion

...

Gradient Descent In Machine Learning-FAQs

For the sake of complexity, we can write our loss function for the single row as below...