Stochastic Gradient Descent Algorithm
- Initialization: Randomly initialize the parameters of the model.
- Set Parameters: Determine the number of iterations and the learning rate (alpha) for updating the parameters.
- Stochastic Gradient Descent Loop: Repeat the following steps until the model converges or reaches the maximum number of iterations:
- Shuffle the training dataset to introduce randomness.
- Iterate over each training example (or a small batch) in the shuffled order.
- Compute the gradient of the cost function with respect to the model parameters using the current training
example (or batch). - Update the model parameters by taking a step in the direction of the negative gradient, scaled by the learning rate.
- Evaluate the convergence criteria, such as the difference in the cost function between iterations of the gradient.
- Return Optimized Parameters: Once the convergence criteria are met or the maximum number of iterations is reached, return the optimized model parameters.
In SGD, since only one sample from the dataset is chosen at random for each iteration, the path taken by the algorithm to reach the minima is usually noisier than your typical Gradient Descent algorithm. But that doesn’t matter all that much because the path taken by the algorithm does not matter, as long as we reach the minimum and with a significantly shorter training time.
The path taken by Batch Gradient Descent is shown below:
A path taken by Stochastic Gradient Descent looks as follows –
One thing to be noted is that, as SGD is generally noisier than typical Gradient Descent, it usually took a higher number of iterations to reach the minima, because of the randomness in its descent. Even though it requires a higher number of iterations to reach the minima than typical Gradient Descent, it is still computationally much less expensive than typical Gradient Descent. Hence, in most scenarios, SGD is preferred over Batch Gradient Descent for optimizing a learning algorithm.
ML | Stochastic Gradient Descent (SGD)
Gradient Descent is an iterative optimization process that searches for an objective function’s optimum value (Minimum/Maximum). It is one of the most used methods for changing a model’s parameters in order to reduce a cost function in machine learning projects.
The primary goal of gradient descent is to identify the model parameters that provide the maximum accuracy on both training and test datasets. In gradient descent, the gradient is a vector pointing in the general direction of the function’s steepest rise at a particular point. The algorithm might gradually drop towards lower values of the function by moving in the opposite direction of the gradient, until reaching the minimum of the function.
Types of Gradient Descent:
Typically, there are three types of Gradient Descent:
- Batch Gradient Descent
- Stochastic Gradient Descent
- Mini-batch Gradient Descent
In this article, we will be discussing Stochastic Gradient Descent (SGD).
Table of Content
- Stochastic Gradient Descent (SGD):
- Stochastic Gradient Descent Algorithm
- Difference between Stochastic Gradient Descent & batch Gradient Descent
- Python Code For Stochastic Gradient Descent
- Stochastic Gradient Descent (SGD) using TensorFLow
- Advantages of Stochastic Gradient Descent
- Disadvantages of Stochastic Gradient Descent