Handling Outliers
Once outliers are detected, several techniques can be used to address them:
- Removing outliers: One of the techniques used to handle the outliers is to remove them from the dataset. However, removing outliers can potentially lead to the loss of valuable data. In such cases where the outliers are allocated to represent the valid data points, it may be appropriate to leave them unchanged.
- Transformation: Transforming the variables is also one kind of outlier handling technique to get rid of the outliers. The general purpose of transforming the values is to reduce the effect of extreme values (outliers) present in the dataset. When the transformation is applied the outliers are brought closer to the rest of the data. This transformation can be done by using methods like scaling, Cube root normalization, Log transformation, and Box transformation.
- Imputation: It is the process of replacing the missing values or outliers in the dataset with its estimated value. This estimated value can be generated by using mean, median, and zero values.
- Robust estimators: The robust estimators are insensitive to outliers that mitigate their impact on statistical analyses. This estimator uses certain algorithms like robust regression and M-estimators. The robust regression handles the outlier by fitting the regression model that is insensitive to outliers.
Outlier Detection in Logistic Regression
Outliers, data points that deviate significantly from the rest, can significantly impact the performance of logistic regression models. In this article we will explore various techniques for detecting and handling outliers in Logistic regression.