Creating a Dataset to apply feature scaling in R

First, we need to create a dataframe.

R

# Age vector
age <- c(19,20,21,22,23,24,24,26,27)
 
# Salary vector
salary <- c(10000,20000,30000,40000,
            50000,60000,70000,80000,90000)
 
# Data frame created using age and salary
df <- data.frame( "Age" = age,
                 "Salary" = salary,
                 stringsAsFactors = FALSE)
df

Output:

  Age Salary
1  19  10000
2  20  20000
3  21  30000
4  22  40000
5  23  50000
6  24  60000
7  24  70000
8  26  80000
9  27  90000

Once the dataset is created. Now we can start implementing Feature Scaling.

By using General Formula

We know the formulas for both standardization and normalization. Let’s apply them one by one.

implement standardization

R

data <- data.frame(Age = rnorm(500, 50, 8),
                   Weight = rnorm(500, 80, 10))
data <- as.data.frame(sapply(df, function(x) 
                      (x-mean(x))/sd(x)))
data

Output:

          Age     Salary                             
1 -1.45833333 -1.4605935
2 -1.08333333 -1.0954451
3 -0.70833333 -0.7302967
4 -0.33333333 -0.3651484
5  0.04166667  0.0000000
6  0.41666667  0.3651484
7  0.41666667  0.7302967
8  1.16666667  1.0954451
9  1.54166667  1.4605935

implement normalization

R

data2 <- data.frame(Age = rnorm(500, 50, 8),
                   Weight = rnorm(500, 80, 10))
data2 <- as.data.frame(sapply(df, function(x) 
                      (x-min(x))/(max(x)-min(x))))
data2

Output:

    Age Salary
1 0.000  0.000
2 0.125  0.125
3 0.250  0.250
4 0.375  0.375
5 0.500  0.500
6 0.625  0.625
7 0.625  0.750
8 0.875  0.875
9 1.000  1.000

Using Caret Library

Let’s import the library caret and then apply the Standardization and Normalisation.

Standardization Using Caret Library

R

# Importing Library
library(caret)
 
# Standardisation:
data1.pre <- preProcess(df, method=c("center", "scale"))
data1<- predict(data1.pre, df)
data1

Output:

          Age     Salary
1 -1.45833333 -1.4605935
2 -1.08333333 -1.0954451
3 -0.70833333 -0.7302967
4 -0.33333333 -0.3651484
5  0.04166667  0.0000000
6  0.41666667  0.3651484
7  0.41666667  0.7302967
8  1.16666667  1.0954451
9  1.54166667  1.4605935

Normalisation Using Caret Library

R

# Normalisation:
data2.pre <- preProcess(df, method="range")
data2 <- predict(data2.pre, df)
data2

Output:

    Age Salary
1 0.000  0.000
2 0.125  0.125
3 0.250  0.250
4 0.375  0.375
5 0.500  0.500
6 0.625  0.625
7 0.625  0.750
8 0.875  0.875
9 1.000  1.000

Using Dplyr Library

Let’s import the library dplyr and then apply the Standardization and Normalisation.

Standardization Using Dplyr Library

R

# Importing library
library(dplyr)
 
# Standardization
data2 <- df %>%
mutate_at(vars("Salary"), scale)
data2

Output:

          Age     Salary
1 -1.45833333 -1.4605935
2 -1.08333333 -1.0954451
3 -0.70833333 -0.7302967
4 -0.33333333 -0.3651484
5  0.04166667  0.0000000
6  0.41666667  0.3651484
7  0.41666667  0.7302967
8  1.16666667  1.0954451
9  1.54166667  1.4605935

Normalisation Using Dplyr Library

R

# Importing library
library(dplyr)
 
# Normalization
data1 <- df %>%
mutate_all(scale)
data1

Output:

  Age     Salary
1  19 -1.4605935
2  20 -1.0954451
3  21 -0.7302967
4  22 -0.3651484
5  23  0.0000000
6  24  0.3651484
7  24  0.7302967
8  26  1.0954451
9  27  1.4605935

Using BBmisc package

BBmisc is an R package so with the help of it we can calculate the standardization and normalization.

Standardization Using BBmisc package

R

# load library
library(BBmisc)
 
# Normalize the Age and Salary columns using min-max normalization
df_standardized <- BBmisc::normalize(df, method = "standardize")
 
df_standardized

Output:

           Age     Salary
1 -1.45833333 -1.4605935
2 -1.08333333 -1.0954451
3 -0.70833333 -0.7302967
4 -0.33333333 -0.3651484
5  0.04166667  0.0000000
6  0.41666667  0.3651484
7  0.41666667  0.7302967
8  1.16666667  1.0954451
9  1.54166667  1.4605935

Normalization Using BBmisc package

R

# load library
library(BBmisc)
 
# Normalize the Age and Salary columns using min-max normalization
df_normalized <- BBmisc::normalize(df, method = "range")
 
df_normalized

Output:

    Age Salary
1 0.000  0.000
2 0.125  0.125
3 0.250  0.250
4 0.375  0.375
5 0.500  0.500
6 0.625  0.625
7 0.625  0.750
8 0.875  0.875
9 1.000  1.000

Feature Scaling Using R

Feature scaling is a technique to improve the accuracy of machine learning models. This can be done by removing unreliable data points from the training set so that the model can learn useful information about relevant features. Feature scaling is widely used in many fields, including business analytics and clinical data science.

Creating a Dataset to apply feature scaling in R

R

By using General Formula

R

R

Using Caret Library

R

R

Using Dplyr Library

R

R

Using BBmisc package

R

R

Feature Scaling Using R

Categories

Contact US

Creating a Dataset to apply feature scaling in R

R

By using General Formula

R

R

Using Caret Library

R

R

Using Dplyr Library

R

R

Using BBmisc package

R

R

Feature Scaling Using R

Similar Reads

Categories

Contact US