Implementing Multiregression with CatBoost
Let’s dive into a practical example of using CatBoost for multiregression:
Install CatBoost
Ensure you have CatBoost installed in your Python environment. You can install it via pip:
pip install catboost
Step 1: Loading a Public Dataset
We’ll using an online publicly accessible dataset for this example. Using its URL, we’ll load it immediately.
import pandas as pd
# Load dataset
url = 'https://media.w3wiki.org/wp-content/uploads/20240527142547/BostonHousing.csv'
df = pd.read_csv(url)
print(df.head())
Output:
crim zn indus chas nox rm age dis rad tax ptratio \
0 0.00632 18.0 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3
1 0.02731 0.0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8
2 0.02729 0.0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8
3 0.03237 0.0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7
4 0.06905 0.0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7
b lstat medv
0 396.90 4.98 24.0
1 396.90 9.14 21.6
2 392.83 4.03 34.7
3 394.63 2.94 33.4
4 396.90 5.33 36.2
Step 2: Preprocessing Data
We’ll prepare the data for modeling, which may include encoding categorical features if present.
import seaborn as sns
import matplotlib.pyplot as plt
# Visualize the distribution of the target variable
sns.histplot(df['medv'], bins=30, kde=True)
plt.title('Distribution of MEDV (Median Value of Homes)')
plt.savefig('Distribution.webp')
plt.show()
Output:
Our data must be ready for the model. This covers managing missing values, standardizing the data, and encoding categorical characteristics.
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# Split the data into features and target
X = df.drop('medv', axis=1)
y = df['medv']
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Normalize the feature data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
Step 3: Train the Model
Now, we will define and train our CatBoost regressor model.
from catboost import CatBoostRegressor
# Initialize the CatBoostRegressor
model = CatBoostRegressor(
iterations=1000, learning_rate=0.05, depth=3, loss_function='RMSE', verbose=200)
# Fit the model
model.fit(X_train_scaled, y_train)
Output:
0: learn: 9.0223472 total: 138ms remaining: 2m 18s
200: learn: 2.4369710 total: 252ms remaining: 1s
400: learn: 1.8078506 total: 365ms remaining: 545ms
600: learn: 1.4641839 total: 475ms remaining: 315ms
800: learn: 1.2249782 total: 587ms remaining: 146ms
999: learn: 1.0551550 total: 696ms remaining: 0us
<catboost.core.CatBoostRegressor at 0x193071691d0>
Step 4: Making Predictions and Evaluating the Model
After training, we make predictions on the test set and evaluate our model using RMSE.
from sklearn.metrics import mean_squared_error
# Make predictions
predictions = model.predict(X_test_scaled)
# Calculate RMSE
rmse = mean_squared_error(y_test, predictions, squared=False)
print(f'Root Mean Squared Error: {rmse}')
Output:
Root Mean Squared Error: 2.9516912601424115
Step 5: Visualizing the Results
Lastly, in order to evaluate the performance of our model, we will plot the actual values against the predictions.
# Visualize the actual vs predicted values
plt.scatter(y_test, predictions)
plt.xlabel('Actual Values')
plt.ylabel('Predicted Values')
plt.title('Actual vs Predicted Values')
plt.plot([min(y_test), max(y_test)], [min(y_test),
max(y_test)], color='red') # Diagonal line
plt.show()
Output:
These examples offer a detailed how-to use CatBoost for multiregression, including the steps of data preparation, model training, and result visualization. Recall that practice and experimentation are the keys to mastering machine learning, so feel free to experiment with other datasets, and parameter adjustments to observe how the model performs.
Multiregression using CatBoost
Multiregression, also known as multiple regression, is a statistical method used to predict a target variable based on two or more predictor variables. This technique is widely used in various fields such as finance, economics, marketing, and machine learning. CatBoost, a powerful gradient boosting library, provides efficient and robust algorithms for multiregression tasks. In this article, we will explore how to leverage CatBoost for multiregression and achieve accurate predictions.
Table of Content
- Understanding Multiregression
- What is CatBoost?
- Implementing Multiregression with CatBoost
- Pros & Cons of Using CatBoost for Multiregression
- Conclusion