Implementing Multiregression with CatBoost

Let’s dive into a practical example of using CatBoost for multiregression:

Install CatBoost

Ensure you have CatBoost installed in your Python environment. You can install it via pip:

pip install catboost

Step 1: Loading a Public Dataset

We’ll using an online publicly accessible dataset for this example. Using its URL, we’ll load it immediately.

Python

import pandas as pd

# Load dataset
url = 'https://media.w3wiki.org/wp-content/uploads/20240527142547/BostonHousing.csv'
df = pd.read_csv(url)
print(df.head())

Output:

crim zn indus chas nox rm age dis rad tax ptratio \
0 0.00632 18.0 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3
1 0.02731 0.0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8
2 0.02729 0.0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8
3 0.03237 0.0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7
4 0.06905 0.0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7

b lstat medv
0 396.90 4.98 24.0
1 396.90 9.14 21.6
2 392.83 4.03 34.7
3 394.63 2.94 33.4
4 396.90 5.33 36.2

Step 2: Preprocessing Data

We’ll prepare the data for modeling, which may include encoding categorical features if present.

Python

import seaborn as sns
import matplotlib.pyplot as plt

# Visualize the distribution of the target variable
sns.histplot(df['medv'], bins=30, kde=True)
plt.title('Distribution of MEDV (Median Value of Homes)')
plt.savefig('Distribution.webp')
plt.show()

Output:

Our data must be ready for the model. This covers managing missing values, standardizing the data, and encoding categorical characteristics.

Python

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Split the data into features and target
X = df.drop('medv', axis=1)
y = df['medv']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Normalize the feature data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

Step 3: Train the Model

Now, we will define and train our CatBoost regressor model.

Python

from catboost import CatBoostRegressor

# Initialize the CatBoostRegressor
model = CatBoostRegressor(
    iterations=1000, learning_rate=0.05, depth=3, loss_function='RMSE', verbose=200)

# Fit the model
model.fit(X_train_scaled, y_train)

Output:

0: learn: 9.0223472 total: 138ms remaining: 2m 18s
200: learn: 2.4369710 total: 252ms remaining: 1s
400: learn: 1.8078506 total: 365ms remaining: 545ms
600: learn: 1.4641839 total: 475ms remaining: 315ms
800: learn: 1.2249782 total: 587ms remaining: 146ms
999: learn: 1.0551550 total: 696ms remaining: 0us
<catboost.core.CatBoostRegressor at 0x193071691d0>

Step 4: Making Predictions and Evaluating the Model

After training, we make predictions on the test set and evaluate our model using RMSE.

Python

from sklearn.metrics import mean_squared_error

# Make predictions
predictions = model.predict(X_test_scaled)

# Calculate RMSE
rmse = mean_squared_error(y_test, predictions, squared=False)
print(f'Root Mean Squared Error: {rmse}')

Output:

Root Mean Squared Error: 2.9516912601424115

Step 5: Visualizing the Results

Lastly, in order to evaluate the performance of our model, we will plot the actual values against the predictions.

Python

# Visualize the actual vs predicted values
plt.scatter(y_test, predictions)
plt.xlabel('Actual Values')
plt.ylabel('Predicted Values')
plt.title('Actual vs Predicted Values')
plt.plot([min(y_test), max(y_test)], [min(y_test),
                                      max(y_test)], color='red')  # Diagonal line
plt.show()

Output:

These examples offer a detailed how-to use CatBoost for multiregression, including the steps of data preparation, model training, and result visualization. Recall that practice and experimentation are the keys to mastering machine learning, so feel free to experiment with other datasets, and parameter adjustments to observe how the model performs.

Multiregression using CatBoost

Multiregression, also known as multiple regression, is a statistical method used to predict a target variable based on two or more predictor variables. This technique is widely used in various fields such as finance, economics, marketing, and machine learning. CatBoost, a powerful gradient boosting library, provides efficient and robust algorithms for multiregression tasks. In this article, we will explore how to leverage CatBoost for multiregression and achieve accurate predictions.

Table of Content

Understanding Multiregression
What is CatBoost?
Implementing Multiregression with CatBoost
Pros & Cons of Using CatBoost for Multiregression
Conclusion

Implementing Multiregression with CatBoost

Install CatBoost

Step 1: Loading a Public Dataset

Step 2: Preprocessing Data

Step 3: Train the Model

Step 4: Making Predictions and Evaluating the Model

Step 5: Visualizing the Results

Multiregression using CatBoost

Categories

Contact US

Implementing Multiregression with CatBoost

Install CatBoost

Step 1: Loading a Public Dataset

Step 2: Preprocessing Data

Step 3: Train the Model

Step 4: Making Predictions and Evaluating the Model

Step 5: Visualizing the Results

Multiregression using CatBoost

Similar Reads

Categories

Contact US