Implementation of Stochastic Gradient Descent Regressor using Scikit-learn

We will use the diabetes dataset to build and evaluate a linear regression model using SGD.

Step 1 : Importing the required libraries

from sklearn.datasets import load_diabetes
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split

Step 2 :Splitting the dataset

We will now split our dataset in test and training parts, in the ratio 1:4.

X,y = load_diabetes(return_X_y=True)
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=2)

Output :

(442, 10)

Step 3 : Fitting the linear Regression model on Training data

After training the linear regression model using the fit method, you can access the coefficients and intercept of the model. The coefficients represent the weight of each feature in the linear equation, and the intercept is the constant term.

reg = LinearRegression(),y_train)


Output :

[ -9.16088483 -205.46225988 516.68462383 340.62734108 -895.54360867
561.21453306 153.88478595 126.73431596 861.12139955 52.41982836]

Each number in the Coefficients array corresponds to a feature from the diabetes dataset, and the Intercept is the constant term in the linear model.

Step 4 : Evaluating the Model

The code is used to predict values on the test set (X_test) using a trained regression model (reg), and then calculate the R-squared score between the predicted values (y_pred) and the actual values (y_test).

y_pred = reg.predict(X_test)



Step 5: Implementing SGD

We wil implement the Stochastic Gradient Descent Regressor using Scikit-learn library.

from sklearn.linear_model import SGDRegressor
reg = SGDRegressor(max_iter=100,learning_rate='constant',eta0=0.01),y_train)

SGDRegressor(learning_rate='constant', max_iter=100)
y_pred = reg.predict(X_test)

Output :

SGDRegressor(learning_rate='constant', max_iter=100)

The scikit-learn‘s implementation of SGDRegressor benefits from optimized internal algorithms, better learning rate schedules, and regularization techniques, leading to improved performance over the custom implementation.

Stochastic Gradient Descent (SGD) is a popular optimization technique in the field of machine learning. It is particularly well-suited for handling large datasets and online learning scenarios where data arrives sequentially. In this article, we will discuss how a stochastic gradient descent regressor is implemented using Scikit-Learn.

