How the Gradient Descent Algorithm Works

For the sake of complexity, we can write our loss function for the single row as below

In the above function x and y are our input data i.e constant. To find the optimal value of weight w and bias b. we partially differentiate with respect to w and b. This is also said that we will find the gradient of loss function J(w,b) with respect to w and b to find the optimal value of w and b.

Gradient of J(w,b) with respect to w

i.e

Gradient of J(w,b) with respect to b

i.e

Here we have considered the linear regression. So that here the parameters are weight and bias only. But in a fully connected neural network model there can be multiple layers and multiple parameters. but the concept will be the same everywhere. And the below-mentioned formula will work everywhere.

Here,

= Learning rate
J = Loss function
= Gradient symbol denotes the derivative of loss function J
Param = weight and bias There can be multiple weight and bias values depending upon the complexity of the model and features in the dataset

In our case:

In the current problem, two input features, So, the weight will be two.

Implementations of the Gradient Descent algorithm for the above model

Steps:

Find the gradient using loss.backward()
Get the parameter using model.linear.weight and model.linear.bias
Update the parameter using the above-defined equation.
Again assign the model parameter to our model

# Find the gradient using 
loss.backward()
# Learning Rate
learning_rate = 0.001
# Model Parameter
w = model.linear.weight
b = model.linear.bias
# Matually Update the model parameter
w = w - learning_rate * w.grad
b = b - learning_rate * b.grad
# assign the weight & bias parameter to the linear layer
model.linear.weight = nn.Parameter(w)
model.linear.bias   = nn.Parameter(b)

Now Repeat this process till 1000 epochs

Python3

# Number of epochs
num_epochs = 1000
 
# Learning Rate
learning_rate = 0.01
 
# SUBPLOT WEIGHT & BIAS VS lOSSES
fig, (ax1, ax2) = plt.subplots(1, 2, sharey=True)
 
for epoch in range(num_epochs):
    # Forward pass
    y_p = model(x)
    loss = Mean_Squared_Error(y_p, y)
     
    # Backproogation 
    # Find the fradient using 
    loss.backward()
 
    # Learning Rate
    learning_rate = 0.001
 
    # Model Parameter
    w = model.linear.weight
    b = model.linear.bias
 
    # Matually Update the model parameter
    w = w - learning_rate * w.grad
    b = b - learning_rate * b.grad
 
    # assign the weight & bias parameter to the linear layer
    model.linear.weight = nn.Parameter(w)
    model.linear.bias   = nn.Parameter(b)
             
    if (epoch+1) % 100 == 0:
        ax1.plot(w.detach().numpy(),loss.item(),'r*-')
        ax2.plot(b.detach().numpy(),loss.item(),'g+-')
        print('Epoch [{}/{}], weight:{}, bias:{} Loss: {:.4f}'.format(
            epoch+1,num_epochs,
            w.detach().numpy(),
            b.detach().numpy(),
            loss.item()))
         
ax1.set_xlabel('weight')
ax2.set_xlabel('bias')
ax1.set_ylabel('Loss')
ax2.set_ylabel('Loss')
plt.show()

Output:

Epoch [100/1000], weight:[[-0.2618025   0.44433367]], bias:[-0.17722966] Loss: 14.1803
Epoch [200/1000], weight:[[-0.21144074  0.35393423]], bias:[-0.7892358] Loss: 10.3030
Epoch [300/1000], weight:[[-0.17063744  0.28172654]], bias:[-1.2897989] Loss: 7.7120
Epoch [400/1000], weight:[[-0.13759881  0.22408141]], bias:[-1.699218] Loss: 5.9806
Epoch [500/1000], weight:[[-0.11086453  0.17808875]], bias:[-2.0340943] Loss: 4.8235
Epoch [600/1000], weight:[[-0.08924612  0.14141548]], bias:[-2.3080034] Loss: 4.0502
Epoch [700/1000], weight:[[-0.0717768   0.11219224]], bias:[-2.5320508] Loss: 3.5333
Epoch [800/1000], weight:[[-0.0576706   0.08892148]], bias:[-2.7153134] Loss: 3.1878
Epoch [900/1000], weight:[[-0.04628877  0.07040432]], bias:[-2.8652208] Loss: 2.9569
Epoch [1000/1000], weight:[[-0.0371125   0.05568104]], bias:[-2.9878428] Loss: 2.8026

Weight & Bias vs Losses – w3wiki

From the above graph and data, we can observe the Losses are decreasing as per the weight and bias variations.

Now we have found the optimal weight and bias values. Print the optimal weight and bias and

Python3

w = model.linear.weight
b = model.linear.bias
 
print('weight(W) = {} \n  bias(b) = {}'.format(
  w.abs(), 
  b.abs()))

Output:

weight(W) = tensor([[0.0371, 0.0557]], grad_fn=<AbsBackward0>) 
  bias(b) = tensor([2.9878], grad_fn=<AbsBackward0>)

Prediction

Python3

pred =  x @ w.T + b
pred[:5]

Output:

tensor([[-2.9765],
        [-3.1385],
        [-3.0818],
        [-3.0756],
        [-2.8681]], grad_fn=<SliceBackward0>)

Gradient Descent Algorithm in Machine Learning

Think about how a machine learns from the data in machine learning and deep learning during training. This involves a large amount of data.

Through the lens of this article, we will delve into the intricacies of minimizing the cost function, a pivotal task in training models.

Table of Content

Gradient Descent in Machine Learning
Gradient Descent Python Implementation
How the Gradient Descent Algorithm Works
Gradient Descent Learning Rate
Vanishing and Exploding Gradients
Different Variants of Gradient Descent
Advantages & Disadvantages of gradient descent
Gradient Descent In Machine Learning-FAQs

How the Gradient Descent Algorithm Works

Gradient of J(w,b) with respect to w

Gradient of J(w,b) with respect to b

Implementations of the Gradient Descent algorithm for the above model

Now Repeat this process till 1000 epochs

Python3

Python3

Prediction

Python3

Gradient Descent Algorithm in Machine Learning

Categories

Contact US

How the Gradient Descent Algorithm Works

Gradient of J(w,b) with respect to w

Gradient of J(w,b) with respect to b

Implementations of the Gradient Descent algorithm for the above model

Now Repeat this process till 1000 epochs

Python3

Python3

Prediction

Python3

Gradient Descent Algorithm in Machine Learning

Similar Reads

Categories

Contact US