Creating a Dataset to apply feature scaling in R
First, we need to create a dataframe.
R
# Age vector age <- c (19,20,21,22,23,24,24,26,27) # Salary vector salary <- c (10000,20000,30000,40000, 50000,60000,70000,80000,90000) # Data frame created using age and salary df <- data.frame ( "Age" = age, "Salary" = salary, stringsAsFactors = FALSE ) df |
Output:
Age Salary
1 19 10000
2 20 20000
3 21 30000
4 22 40000
5 23 50000
6 24 60000
7 24 70000
8 26 80000
9 27 90000
Once the dataset is created. Now we can start implementing Feature Scaling.
By using General Formula
We know the formulas for both standardization and normalization. Let’s apply them one by one.
implement standardization
R
data <- data.frame (Age = rnorm (500, 50, 8), Weight = rnorm (500, 80, 10)) data <- as.data.frame ( sapply (df, function (x) (x- mean (x))/ sd (x))) data |
Output:
Age Salary
1 -1.45833333 -1.4605935
2 -1.08333333 -1.0954451
3 -0.70833333 -0.7302967
4 -0.33333333 -0.3651484
5 0.04166667 0.0000000
6 0.41666667 0.3651484
7 0.41666667 0.7302967
8 1.16666667 1.0954451
9 1.54166667 1.4605935
implement normalization
R
data2 <- data.frame (Age = rnorm (500, 50, 8), Weight = rnorm (500, 80, 10)) data2 <- as.data.frame ( sapply (df, function (x) (x- min (x))/( max (x)- min (x)))) data2 |
Output:
Age Salary
1 0.000 0.000
2 0.125 0.125
3 0.250 0.250
4 0.375 0.375
5 0.500 0.500
6 0.625 0.625
7 0.625 0.750
8 0.875 0.875
9 1.000 1.000
Using Caret Library
Let’s import the library caret and then apply the Standardization and Normalisation.
Standardization Using Caret Library
R
# Importing Library library (caret) # Standardisation: data1.pre <- preProcess (df, method= c ( "center" , "scale" )) data1<- predict (data1.pre, df) data1 |
Output:
Age Salary
1 -1.45833333 -1.4605935
2 -1.08333333 -1.0954451
3 -0.70833333 -0.7302967
4 -0.33333333 -0.3651484
5 0.04166667 0.0000000
6 0.41666667 0.3651484
7 0.41666667 0.7302967
8 1.16666667 1.0954451
9 1.54166667 1.4605935
Normalisation Using Caret Library
R
# Normalisation: data2.pre <- preProcess (df, method= "range" ) data2 <- predict (data2.pre, df) data2 |
Output:
Age Salary
1 0.000 0.000
2 0.125 0.125
3 0.250 0.250
4 0.375 0.375
5 0.500 0.500
6 0.625 0.625
7 0.625 0.750
8 0.875 0.875
9 1.000 1.000
Using Dplyr Library
Let’s import the library dplyr and then apply the Standardization and Normalisation.
Standardization Using Dplyr Library
R
# Importing library library (dplyr) # Standardization data2 <- df %>% mutate_at ( vars ( "Salary" ), scale) data2 |
Output:
Age Salary
1 -1.45833333 -1.4605935
2 -1.08333333 -1.0954451
3 -0.70833333 -0.7302967
4 -0.33333333 -0.3651484
5 0.04166667 0.0000000
6 0.41666667 0.3651484
7 0.41666667 0.7302967
8 1.16666667 1.0954451
9 1.54166667 1.4605935
Normalisation Using Dplyr Library
R
# Importing library library (dplyr) # Normalization data1 <- df %>% mutate_all (scale) data1 |
Output:
Age Salary
1 19 -1.4605935
2 20 -1.0954451
3 21 -0.7302967
4 22 -0.3651484
5 23 0.0000000
6 24 0.3651484
7 24 0.7302967
8 26 1.0954451
9 27 1.4605935
Using BBmisc package
BBmisc is an R package so with the help of it we can calculate the standardization and normalization.
Standardization Using BBmisc package
R
# load library library (BBmisc) # Normalize the Age and Salary columns using min-max normalization df_standardized <- BBmisc:: normalize (df, method = "standardize" ) df_standardized |
Output:
Age Salary
1 -1.45833333 -1.4605935
2 -1.08333333 -1.0954451
3 -0.70833333 -0.7302967
4 -0.33333333 -0.3651484
5 0.04166667 0.0000000
6 0.41666667 0.3651484
7 0.41666667 0.7302967
8 1.16666667 1.0954451
9 1.54166667 1.4605935
Normalization Using BBmisc package
R
# load library library (BBmisc) # Normalize the Age and Salary columns using min-max normalization df_normalized <- BBmisc:: normalize (df, method = "range" ) df_normalized |
Output:
Age Salary
1 0.000 0.000
2 0.125 0.125
3 0.250 0.250
4 0.375 0.375
5 0.500 0.500
6 0.625 0.625
7 0.625 0.750
8 0.875 0.875
9 1.000 1.000
Feature Scaling Using R
Feature scaling is a technique to improve the accuracy of machine learning models. This can be done by removing unreliable data points from the training set so that the model can learn useful information about relevant features. Feature scaling is widely used in many fields, including business analytics and clinical data science.