How to use sci-kit learn Standard scaler In Python Pandas
Sci-kit earn is a machine learning and model building library. We can perform many operations in this library like preprocessing, Analyzing, and also model building for all kinds of machine learning like supervised, and Unsupervised learning problems. In this library, a preprocessing method called standardscaler() is used for standardizing the data.
Syntax:
scaler = StandardScaler()
df = scaler.fit_transform(df)
In this example, we are going to transform the whole data into a standardized form. To do that we first need to create a standardscaler() object and then fit and transform the data.
Example: Standardizing values
Python
# Importing the library import pandas as pd from sklearn.preprocessing import StandardScaler # Creating the data frame details = { 'col1' : [ 1 , 3 , 5 , 7 , 9 ], 'col2' : [ 7 , 4 , 35 , 14 , 56 ] } # creating a Dataframe object df = pd.DataFrame(details) # define standard scaler scaler = StandardScaler() # transform data df = scaler.fit_transform(df) |
Output:
How to Standardize Data in a Pandas DataFrame?
In this article, we will learn how to standardize the data in a Pandas Dataframe.
Standardization is a very important concept in feature scaling which is an integral part of feature engineering. When you collect data for data analysis or machine learning, we will be having a lot of features, which are independent features. With the help of the independent features, we will try to predict the dependent feature in supervised learning. While seeing the data if you see there will be more noise in the data which will put the model at risk of being influenced by the outliers. So for this, we will commonly normalize or standardize the data. Now let’s discuss further the topic of standardization.
It is another process of scaling down the data and making it easier for the machine learning model to learn from it. In this method, we will try to reduce the mean to ‘0’ and the standard deviation to ‘1’.
Another important thing you have to know is when you normalize the data the values will shrink down to a specific range which is from 0 to 1. In standardization, there are no specific boundaries for the data to shrink down to.