Regularization Techniques Implementation

We’ll compare the performance of Linear Regression, Lasso, and Ridge regression models.

Libraries Imported:

We import necessary functions and classes from scikit-learn (sklearn) library.

  • fetch_california_housing is used to load the California Housing dataset.
  • train_test_split is used to split the dataset into training and testing sets.
  • LinearRegression, Lasso, and Ridge are classes for linear regression, Lasso regression, and Ridge regression respectively.
  • mean_squared_error is a function used to compute mean squared error, which is a common metric for evaluating regression models.

Python3




from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, Lasso, Ridge , ElasticNet
from sklearn.metrics import mean_squared_error


Dataset Loading and Splitting:

We use fetch_california_housing function to load the California Housing dataset. This dataset contains features related to housing in California and the target variable is the median house value. X contains the features (input variables) and y contains the target variable (median house value). We split the dataset into training and testing sets using train_test_split function. Here, 80% of the data is used for training and 20% for testing. random_state=42 ensures reproducibility.

Python3




california = fetch_california_housing()
X, y = california.data, california.target
 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


Initializing Models:

  • We initialize instances of Linear Regression, Lasso Regression, and Ridge Regression models and elastic Net with default settings.
  • Alpha=0.1 specifies the regularization strength for Lasso and Ridge models.
  • In the ElasticNet model initialization ElasticNet(alpha=0.1, l1_ratio=0.5), the alpha parameter controls the regularization strength, and the l1_ratio parameter specifies the mix between L1 (Lasso) and L2 (Ridge) penalties

Python3




linear_model = LinearRegression()
lasso_model = Lasso(alpha=0.1)
ridge_model = Ridge(alpha=0.1)
elasticnet_model = ElasticNet(alpha=0.1, l1_ratio=0.5)


Training Models:

We train each model on the training data using the fit method. This involves finding the optimal parameters that minimize the chosen loss function.

Python3




linear_model.fit(X_train, y_train)
lasso_model.fit(X_train, y_train)
ridge_model.fit(X_train, y_train)
elasticnet_model.fit(X_train, y_train)


Model Evaluation and Prediction:

We use the mean_squared_error function to calculate the mean squared error between actual and predicted values for both training and testing datasets.

Python3




linear_train_mse = mean_squared_error(y_train, linear_model.predict(X_train))
linear_test_mse = mean_squared_error(y_test, linear_model.predict(X_test))
 
lasso_train_mse = mean_squared_error(y_train, lasso_model.predict(X_train))
lasso_test_mse = mean_squared_error(y_test, lasso_model.predict(X_test))
 
ridge_train_mse = mean_squared_error(y_train, ridge_model.predict(X_train))
ridge_test_mse = mean_squared_error(y_test, ridge_model.predict(X_test))
 
elasticnet_train_mse = mean_squared_error(y_train, elasticnet_model.predict(X_train))
elasticnet_test_mse = mean_squared_error(y_test, elasticnet_model.predict(X_test))
 
print("Linear Regression Model - Train MSE:", linear_train_mse)
print("Linear Regression Model - Test MSE:", linear_test_mse)
 
print("\nLasso Regression Model - Train MSE:", lasso_train_mse)
print("Lasso Regression Model - Test MSE:", lasso_test_mse)
 
print("\nRidge Regression Model - Train MSE:", ridge_train_mse)
print("Ridge Regression Model - Test MSE:", ridge_test_mse)
 
print("\nElasticNet Regression Model - Train MSE:", elasticnet_train_mse)
print("ElasticNet Regression Model - Test MSE:", elasticnet_test_mse)


Output:

Linear Regression Model - Train MSE: 0.5179331255246699
Linear Regression Model - Test MSE: 0.5558915986952422

Lasso Regression Model - Train MSE: 0.60300014172392
Lasso Regression Model - Test MSE: 0.6135115198058131

Ridge Regression Model - Train MSE: 0.5179331264220425
Ridge Regression Model - Test MSE: 0.5558827543113783

ElasticNet Regression Model - Train MSE: 0.5622311141903511
ElasticNet Regression Model - Test MSE: 0.5730994198028208

The Linear Regression and Ridge Regression models seem to perform slightly better than Lasso and ElasticNet on this dataset.

Regularization Techniques in Machine Learning

Overfitting is a major concern in the field of machine learning, as models aim to extract complex patterns from data. When a model learns to commit the training data to memory instead of making good generalizations to new data, this is known as overfitting. The model may perform poorly as a result when used in real-world situations. A potent method for overcoming this difficulty is regularization, which provides a methodical way to avoid overfitting and enhance the capacity of machine learning models to generalize.

Similar Reads

What is Regularization?

Regularization is a technique used to prevent overfitting by adding a penalty term to the model’s objective function during training. The objective is to discourage the model from fitting the training data too closely and promote simpler models that generalize better to unseen data. Regularization methods control the complexity of models by penalizing large coefficients or by selecting a subset of features, thus helping to strike the right balance between bias and variance....

Types of Regularization

L1 Regularization (Lasso):...

Regularization Techniques Implementation

We’ll compare the performance of Linear Regression, Lasso, and Ridge regression models....

Objective of Regularization

...

Benefits of Regularization

...

Conclusion

...