Gaussian Naive Bayes
Gaussian Naive Bayes is the application of Naive Bayes on a normally distributed data. Gaussian Naive Bayes assumes that the likelihood(P()) follows the Gaussian Distribution for each within . Therefore,
To classify each new data point x the algorithm finds out the maximum value of the posterior probability of each class and assigns the data point to that class.
Real life example with Gaussian Naive Bayes:
Here we will be applying Gaussian Naive Bayes to the Iris Dataset, this dataset consists of four features namely Sepal Length in cm, Sepal Width in cm, Petal Length in cm, Petal Width in cm and from these features we have to identify which feature set belongs to which specie class. The iris flower dataset could be obtained from here.
Now we will be using Gaussian Naive Bayes in predicting the correct specie of Iris flower.
Lets break down the above code step by step:
- First we will be importing the required libraries: pandas for data manipulation, train_test_split to split the data into training and testing sets, GaussianNB for the Gaussian Naive Bayes classifier, accuracy_score to evaluate the model, and LabelEncoder to encode the categorical target variable.
Python3
import pandas as pd from sklearn.model_selection import train_test_split from sklearn.naive_bayes import GaussianNB from sklearn.metrics import accuracy_score from sklearn.preprocessing import LabelEncoder |
- After that we will load the Iris dataset from a CSV file named “Iris.csv” into a pandas DataFrame.
- Then we will separate the features (X) and the target variable (y) from the dataset. Features are obtained by dropping the “Species” column, and the target variable is set to the “Species” column which we will be predicting.
Python3
# Load the Iris dataset data = pd.read_csv( "Iris.csv" ) # Select features and target X = data.drop( "Species" , axis = 1 ) y = data[ 'Species' ] |
- Since the target variable “Species” is categorical, we will be using LabelEncoder to convert it into numerical form. This is necessary for the Gaussian Naive Bayes classifier, as it requires numerical inputs.
- We will be splitting the dataset into training and testing sets using the train_test_split function. 70% of the data is used for training, and 30% is used for testing. The random_state parameter ensures reproducibility of the same data.
Python3
# Encoding the Species column to get numerical class le = LabelEncoder() y = le.fit_transform(y) # Split the data into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3 , random_state = 42 ) |
- We will be creating a Gaussian Naive Bayes Classifier(gnb) and then training it on the training data using the fit method.
Python3
# Gaussian Naive Bayes classifier gnb = GaussianNB() # Train the classifier on the training data gnb.fit(X_train, y_train) |
- At last we will be using the trained model to make predictions on the testing data.
Python3
# Make predictions on the testing data y_pred = gnb.predict(X_test) # Calculate the accuracy of the model accuracy = accuracy_score(y_test, y_pred) print (f "The Accuracy of Prediction on Iris Flower is: {accuracy}" ) |
Output:
The Accuracy of Prediction on Iris Flower is: 1.0
Gaussian Naive Bayes
In the vast field of machine learning, classification algorithms play a pivotal role in making sense of data. One such algorithm, Gaussian Naive Bayes, stands out for its simplicity, efficiency, and effectiveness. In this article, we will delve into the principles behind Gaussian Naive Bayes, explore its applications, and understand why it is a popular choice for various tasks.