Steps to Implement Chain Rule Derivative with Mathematical Notation
Let’s consider a simple example where we have a neural network with two layers. The forward pass of this network can be represented as:
[Tex]× = W_2 • 0(W_i • 2 +b) +b_2[/Tex]
where:
- x is the input
- W1 and W2 are the weight matrices of the first and second layers, respectively
- b1 and b2 are the biases
- sigma is the activation function
To compute the gradient of the loss function with respect to the weights W1 and W2 using backpropagation, we apply the chain rule step by step:
- Compute the derivative of the loss with respect to the output:
dL/dz
- Compute the derivative of the output with respect to each weight and bias, applying the chain rule at each step:
dz/dW2
dz/db2
dz/dW1
dz/db1
- Update the weights and biases using gradient descent or another optimization algorithm: Let’s consider a specific example where we have a neural network with one input layer, one hidden layer, and one output layer. We’ll use the sigmoid activation function.
Python Implementation
Here’s a step-by-step explanation:
- Define the sigmoid activation function: The
sigmoid
function takes an inputx
and returns the sigmoid activation applied tox
. - Define the forward pass function: The
forward_pass
function takes an inputx
, weightsW1
andW2
, biasesb1
andb2
, and performs the forward pass through the neural network. It calculates the output of the hidden layer (a1
) and the output layer (a2
) using the sigmoid activation function. - Define the input: The input
x
is a NumPy array representing the features. - Define weights and biases:
W1
is a 2×2 matrix representing the weights of the connections between the input and the hidden layer.b1
is a 1×2 vector representing the biases of the hidden layer.W2
is a 1×2 vector representing the weights of the connections between the hidden layer and the output layer.b2
is a scalar representing the bias of the output layer. - Perform the forward pass: The
forward_pass
function is called with the inputx
, weightsW1
andW2
, biasesb1
andb2
, and it calculates the output of the neural network. - Print the output: The calculated output of the neural network is printed.
import numpy as np
# Define sigmoid activation function
def sigmoid(x):
return 1 / (1 + np.exp(-x))
# Forward pass
def forward_pass(x, W1, b1, W2, b2):
z1 = np.dot(W1, x) + b1
a1 = sigmoid(z1)
z2 = np.dot(W2, a1) + b2
a2 = sigmoid(z2)
return a2
# Define input
x = np.array([0.5, 0.3])
# Define weights and biases
W1 = np.array([[0.1, 0.2], [0.3, 0.4]])
b1 = np.array([0.5, 0.6])
W2 = np.array([0.7, 0.8])
b2 = 0.9
# Perform forward pass
output = forward_pass(x, W1, b1, W2, b2)
print("Output:", output)
Output :
Output: 0.871843204787514
In conclusion, the forward pass is a fundamental step in the operation of a neural network. It involves calculating the output of the network for a given input by propagating the input through the network’s layers, applying weights and biases, and using activation functions to introduce non-linearity. The forward pass is essential for making predictions with a neural network and is a building block for more complex operations like training and optimization.
Chain Rule Derivative in Machine Learning
In machine learning, understanding the chain rule and its application in computing derivatives is essential. The chain rule allows us to find the derivative of composite functions, which frequently arise in machine learning models due to their layered architecture. These models often involve multiple nested functions, and the chain rule helps us compute gradients efficiently for optimization algorithms like gradient descent.