Regularization Techniques Implementation
We’ll compare the performance of Linear Regression, Lasso, and Ridge regression models.
Libraries Imported:
We import necessary functions and classes from scikit-learn (sklearn) library.
- fetch_california_housing is used to load the California Housing dataset.
- train_test_split is used to split the dataset into training and testing sets.
- LinearRegression, Lasso, and Ridge are classes for linear regression, Lasso regression, and Ridge regression respectively.
- mean_squared_error is a function used to compute mean squared error, which is a common metric for evaluating regression models.
Python3
from sklearn.datasets import fetch_california_housing from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression, Lasso, Ridge , ElasticNet from sklearn.metrics import mean_squared_error |
Dataset Loading and Splitting:
We use fetch_california_housing function to load the California Housing dataset. This dataset contains features related to housing in California and the target variable is the median house value. X contains the features (input variables) and y contains the target variable (median house value). We split the dataset into training and testing sets using train_test_split function. Here, 80% of the data is used for training and 20% for testing. random_state=42 ensures reproducibility.
Python3
california = fetch_california_housing() X, y = california.data, california.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2 , random_state = 42 ) |
Initializing Models:
- We initialize instances of Linear Regression, Lasso Regression, and Ridge Regression models and elastic Net with default settings.
- Alpha=0.1 specifies the regularization strength for Lasso and Ridge models.
- In the ElasticNet model initialization
ElasticNet(alpha=0.1, l1_ratio=0.5)
, thealpha
parameter controls the regularization strength, and thel1_ratio
parameter specifies the mix between L1 (Lasso) and L2 (Ridge) penalties
Python3
linear_model = LinearRegression() lasso_model = Lasso(alpha = 0.1 ) ridge_model = Ridge(alpha = 0.1 ) elasticnet_model = ElasticNet(alpha = 0.1 , l1_ratio = 0.5 ) |
Training Models:
We train each model on the training data using the fit method. This involves finding the optimal parameters that minimize the chosen loss function.
Python3
linear_model.fit(X_train, y_train) lasso_model.fit(X_train, y_train) ridge_model.fit(X_train, y_train) elasticnet_model.fit(X_train, y_train) |
Model Evaluation and Prediction:
We use the mean_squared_error function to calculate the mean squared error between actual and predicted values for both training and testing datasets.
Python3
linear_train_mse = mean_squared_error(y_train, linear_model.predict(X_train)) linear_test_mse = mean_squared_error(y_test, linear_model.predict(X_test)) lasso_train_mse = mean_squared_error(y_train, lasso_model.predict(X_train)) lasso_test_mse = mean_squared_error(y_test, lasso_model.predict(X_test)) ridge_train_mse = mean_squared_error(y_train, ridge_model.predict(X_train)) ridge_test_mse = mean_squared_error(y_test, ridge_model.predict(X_test)) elasticnet_train_mse = mean_squared_error(y_train, elasticnet_model.predict(X_train)) elasticnet_test_mse = mean_squared_error(y_test, elasticnet_model.predict(X_test)) print ( "Linear Regression Model - Train MSE:" , linear_train_mse) print ( "Linear Regression Model - Test MSE:" , linear_test_mse) print ( "\nLasso Regression Model - Train MSE:" , lasso_train_mse) print ( "Lasso Regression Model - Test MSE:" , lasso_test_mse) print ( "\nRidge Regression Model - Train MSE:" , ridge_train_mse) print ( "Ridge Regression Model - Test MSE:" , ridge_test_mse) print ( "\nElasticNet Regression Model - Train MSE:" , elasticnet_train_mse) print ( "ElasticNet Regression Model - Test MSE:" , elasticnet_test_mse) |
Output:
Linear Regression Model - Train MSE: 0.5179331255246699
Linear Regression Model - Test MSE: 0.5558915986952422
Lasso Regression Model - Train MSE: 0.60300014172392
Lasso Regression Model - Test MSE: 0.6135115198058131
Ridge Regression Model - Train MSE: 0.5179331264220425
Ridge Regression Model - Test MSE: 0.5558827543113783
ElasticNet Regression Model - Train MSE: 0.5622311141903511
ElasticNet Regression Model - Test MSE: 0.5730994198028208
The Linear Regression and Ridge Regression models seem to perform slightly better than Lasso and ElasticNet on this dataset.
Regularization Techniques in Machine Learning
Overfitting is a major concern in the field of machine learning, as models aim to extract complex patterns from data. When a model learns to commit the training data to memory instead of making good generalizations to new data, this is known as overfitting. The model may perform poorly as a result when used in real-world situations. A potent method for overcoming this difficulty is regularization, which provides a methodical way to avoid overfitting and enhance the capacity of machine learning models to generalize.