Outlier Removal in Dataset using IQR

In this example, we are using the interquartile range (IQR) method to detect and remove outliers in the ‘bmi’ column of the diabetes dataset. It calculates the upper and lower limits based on the IQR, identifies outlier indices using Boolean arrays, and then removes the corresponding rows from the DataFrame, resulting in a new DataFrame with outliers excluded. The before and after shapes of the DataFrame are printed for comparison.

Python3

# Importing
import sklearn
from sklearn.datasets import load_diabetes
import pandas as pd
 
# Load the dataset
diabetes = load_diabetes()
 
# Create the dataframe
column_name = diabetes.feature_names
df_diabetes = pd.DataFrame(diabetes.data)
df_diabetes .columns = column_name
df_diabetes .head()
print("Old Shape: ", df_diabetes.shape)
 
''' Detection '''
# IQR
# Calculate the upper and lower limits
Q1 = df_diabetes['bmi'].quantile(0.25)
Q3 = df_diabetes['bmi'].quantile(0.75)
IQR = Q3 - Q1
lower = Q1 - 1.5*IQR
upper = Q3 + 1.5*IQR
 
# Create arrays of Boolean values indicating the outlier rows
upper_array = np.where(df_diabetes['bmi'] >= upper)[0]
lower_array = np.where(df_diabetes['bmi'] <= lower)[0]
 
# Removing the outliers
df_diabetes.drop(index=upper_array, inplace=True)
df_diabetes.drop(index=lower_array, inplace=True)
 
# Print the new shape of the DataFrame
print("New Shape: ", df_diabetes.shape)

Output:

Old Shape:  (442, 10)
New Shape:  (439, 10)

Detect and Remove the Outliers using Python

Outliers, deviating significantly from the norm, can distort measures of central tendency and affect statistical analyses. The piece explores common causes of outliers, from errors to intentional introduction, and highlights their relevance in outlier mining during data analysis.

The article delves into the significance of outliers in data analysis, emphasizing their potential impact on statistical results.

Outlier Removal in Dataset using IQR

Python3

Detect and Remove the Outliers using Python

Categories

Contact US

Outlier Removal in Dataset using IQR

Python3

Detect and Remove the Outliers using Python

Similar Reads

Categories

Contact US