Handling Outliers

Outliers can have a very bad effect on our model like in linear regression if a data point is an outlier then it can add a very large mean square error. Removing outliers is a good process in EDA. Some models like Decisiontree and ensemble methods like RandomForests are not that much by outliers. However, it is always a good practice to handle the outlier.

Plotting Boxplot to Visualize the Outliers

Boxplots are very useful in plotting the spread and skewness of the data, it is also useful in plotting the individual’s outlier data points, they consist of the box which represents points in the range of 25% to 75% quantiles. While the line in the middle of the box represents the median and the whisker at the end of the box shows the range of below 25 % and 75% excluding outliers.

Python3

fig = plt.figure(figsize=(8, 8))
temp = dataset.drop("Date", axis=1).columns.tolist()
for i, item in enumerate(temp):
    plt.subplot(2, 3, i+1)
    sns.boxplot(data=dataset, x=item, color='violet')
plt.tight_layout(pad=0.4, w_pad=0.5, h_pad=2.0)
plt.show()

Output:

Outliers in columns using Boxplot

It can be seen clearly that the column ‘USO’ has outliers present in the column, so we create a function to normalize the outlier present in the column.

Python3

def outlier_removal(column):
    # Capping the outlier rows with Percentiles
    upper_limit = column.quantile(.95)
    # set upper limit to 95percentile
    lower_limit = column.quantile(.05)
    # set lower limit to 5 percentile
    column.loc[(column > upper_limit)] = upper_limit
    column.loc[(column < lower_limit)] = lower_limit
    return column

Here We have set the upper limit of the column to 95 %of the data and the lower limit to the 5 %. that means that which are greater than 95% percentile of the data are normalized to the data 95% value same for the data points which are lower than 5% of the data.

Python3

# Normalize outliers in columns except Date
 
dataset[['SPX', 'GLD', 'USO', 'EUR/USD']] = \
    dataset[['SPX', 'GLD', 'USO', 'EUR/USD']].apply(outlier_removal)

Here using the pandas apply function We have applied the outlier_removal function to each of the rows of the columns

Gold Price Prediction using Machine Learning

In This article, We will be making a project from scratch about Gold price prediction. To build any data science project We have to follow certain steps that need not be in the same order. In our project, We will go through these steps sequentially.

Problem Formulation
Data preprocessing
Data wrangling
Model Development
Model Explainability
Model Deployment

Handling Outliers

Plotting Boxplot to Visualize the Outliers

Python3

Python3

Python3

Gold Price Prediction using Machine Learning

Categories

Contact US

Handling Outliers

Plotting Boxplot to Visualize the Outliers

Python3

Python3

Python3

Gold Price Prediction using Machine Learning

Similar Reads

Categories

Contact US