Text Features to Numerical Features using CatBoost : Implementation
Step 1: Install CatBoost and Import CatBoost
Ensure you have CatBoost installed:
!pip install catboost
Importing CatBoost
from catboost import CatBoostClassifier, Pool
import pandas as pd
Step 2: Prepare Dataset
We’ll illustrate the procedure using an example dataset. Here, categorical characteristics like “City” and “Weather” are present in the dataset:
data = {
'City': ['New York', 'London', 'Tokyo', 'New York', 'Tokyo'],
'Weather': ['Sunny', 'Rainy', 'Sunny', 'Snowy', 'Rainy'],
'Label': [1, 0, 1, 0, 0]
}
df = pd.DataFrame(data)
Step 3: Define Features and Target
Determine the target variable and its characteristics:
X = df[['City', 'Weather']]
y = df['Label']
Step 4: Initialize and Train the Model
Establish categorical characteristics and set the CatBoostClassifier’s initialization, To manage the data and indicate which characteristics are categorical, create a Pool object as follows:
categorical_features = ['City', 'Weather']
model = CatBoostClassifier(iterations=100, depth=3, learning_rate=0.1, loss_function='Logloss')
train_pool = Pool(data=X, label=y, cat_features=categorical_features)
model.fit(train_pool)
Step 5: View Transformed Features
During training, CatBoost internally modifies the category characteristics. You may access the feature importances in order to examine the altered features:
importances = model.get_feature_importance(train_pool, prettified=True)
print(importances)
Output:
Feature Id Importances
0 City 82.857487
1 Weather 17.142513
Transform Text Features to Numerical Features with CatBoost
Handling text and category data is essential to machine learning to create correct prediction models. Yandex’s gradient boosting library, CatBoost, performs very well. It provides sophisticated methods to convert text characteristics into numerical ones and supports categorical features natively, both of which may greatly enhance model performance. This article will focus on how to transform text features into numerical features using CatBoost, enhancing the model’s predictive power.
Table of Content
- Text Processing in CatBoost
- Steps to Transform Text Features to Numerical Features
- 1. Loading and Storing Text Features
- 2. Preprocessing Text Features
- 3. Calculating New Features
- 4. Training the Model
- Text Features to Numerical Features using CatBoost : Implementation