Pandas – Strip whitespace from Entire DataFrame
Our aim is to remove all the extra whitespace and organize it in a systematic way. We will use different methods which will help us to remove all the extra space from the cell’s. Different methods are :
- Using Strip() function
- Using Skipinitialspace
- Using replace function
- Using Converters
Strip whitespace from Entire DataFrame Using Strip() function
Pandas provide predefine method “pandas.Series.str.strip()” to remove the whitespace from the string. Using strip function we can easily remove extra whitespace from leading and trailing whitespace from starting. It returns a series or index of an object. It takes set of characters that we want to remove from head and tail of string(leading and trailing character’s). By default, it is none and if we do not pass any characters then it will remove leading and trailing whitespace from the string. It returns a series or index of an object.
Syntax: pandas.Series.str.strip(to_strip = None)
Explanation: It takes set of characters that we want to remove from head and tail of string(leading and trailing character’s).
Parameter: By default it is none and if we do not pass any characters then it will remove leading and trailing whitespace from the string. It returns series or index of object.
In this example, we code creates a pandas DataFrame named ‘df’ with columns ‘Names’, ‘Age’, ‘Blood Group’, and ‘Gender’. It attempts to remove leading and trailing spaces from the ‘Names’, ‘Blood Group’, and ‘Gender’ columns using the strip()
function, but the changes are not applied to the DataFrame; to achieve that, the code should assign the stripped values back to the respective columns like df['Names'] = df['Names'].str.strip()
.
Python3
# importing library import pandas as pd # Creating dataframe df = pd.DataFrame({ 'Names' : [ ' Sunny' , 'Bunny' , 'Ginny ' , ' Binny ' , ' Chinni' , 'Minni' ], 'Age' : [ 23 , 44 , 23 , 54 , 22 , 11 ], 'Blood Group' : [ ' A+' , ' B+' , 'O+' , 'O-' , ' A-' , 'B-' ], 'Gender' : [ ' M' , ' M' , 'F' , 'F' , 'F' , ' F' ] }) # As dataset having lot of extra spaces in cell so lets remove them using strip() function df[ 'Names' ]. str .strip() df[ 'Blood Group' ]. str .strip() df[ 'Gender' ]. str .strip() # Printing dataframe print (df) |
Output:
Remove Space from Columns in Pandas Using Skipinitialspace
It is not any method but it is one of the parameters present inside read_csv() method present in Pandas. Inside pandas.read_csv() method skipinitialspace parameter is present using which we can skip initial space present in our whole dataframe. By default, it is False, make it True to remove extra space.
Syntax : pandas.read_csv(‘path_of_csv_file’, skipinitialspace = True)
# By default value of skipinitialspace is False, make it True to use this parameter.
In this example, we will use Skipinitialspace to strip whitespace from entire DataFrame. Here, we uses the pandas library to read a CSV file named ‘student_data.csv’ and employs the skipinitialspace=True
parameter to eliminate leading spaces in the data while loading it into a DataFrame. Finally, it prints the contents of the DataFrame.
Python3
# importing library import pandas as pd # reading csv file and at a same time using skipinitial attribute which will remove extra space df = pd.read_csv( '\\student_data.csv' , skipinitialspace = True ) # printing dataset print (df) |
Output:
Strip whitespace from Entire DataFrame Using replace function
Using replace() function also we can remove extra whitespace from the dataframe. Pandas provide predefine method “pandas.Series.str.replace()” to remove whitespace. Its program will be same as strip() method program only one difference is that here we will use replace function at the place of strip().
Syntax:
Syntax: pandas.Series.str.replace(‘ ‘, ”)
In this example, we are using replace() function to strip whitespace from entire dataframe. The code attempts to remove spaces within the ‘Names’, ‘Blood Group’, and ‘Gender’ columns of a pandas DataFrame named ‘df’ using the str.replace(' ', '')
method, but it does not modify the original DataFrame. To apply the changes, the code should assign the modified values back to the respective columns, like df['Names'] = df['Names'].str.replace(' ', '')
.
Python3
# importing library import pandas as pd # Creating dataframe df = pd.DataFrame({ 'Name' : [ ' Sunny' , 'Bunny' , 'Ginny ' , ' Binny ' , ' Chinni' , 'Minni' ], 'Age' : [ 23 , 44 , 23 , 54 , 22 , 11 ], 'Blood Group' : [ ' A+' , ' B+' , 'O+' , 'O-' , ' A-' , 'B-' ], 'Gender' : [ ' M' , ' M' , 'F' , 'F' , 'F' , ' F' ] }) # As dataset having lot of extra spaces in cell so lets remove them using strip() function df[ 'Names' ]. str .replace( ' ' , '') df[ 'Blood Group' ]. str .replace( ' ' , '') df[ 'Gender' ]. str .replace( ' ' , '') # Printing dataframe print (df) |
Output:
Remove Space from Columns in Pandas Using Converters
It is similar as skipinitialspace, it is one of the parameter present inside pandas predefine method name “read_csv”. It is used to apply different functions on particular columns. We have to pass functions in the dictionary. Here we will pass strip() function directly which will remove the extra space during reading csv file.
Syntax : pd.read_csv(“path_of_file”, converters={‘column_names’: function_name})
# Pass dict of functions and column names, where column names act as unique keys and function as value.
In this example, we are using converters. The code reads a CSV file named ‘student_data.csv’ into a pandas DataFrame, and it uses the converters
attribute to apply the str.strip()
function to remove leading and trailing spaces for the ‘Name’, ‘Blood Group’, and ‘Gender’ columns while loading the data. Finally, it prints the contents of the DataFrame.
Python3
# importing library import pandas as pd # reading csv file and at a same time using converters attribute which will remove extra space df = pd.read_csv( '\\student_data.csv' , converters = { 'Name' : str .strip(), 'Blood Group' : str .strip(), 'Gender' : str .strip()}) # printing dataset print (df) |
Output:
Removing Extra Whitespace from Whole DataFrame
The code defines a pandas DataFrame named ‘df’ with columns ‘Names’, ‘Age’, ‘Blood_Group’, and ‘Gender’. It also includes a function called whitespace_remover
that iterates over the columns of a given DataFrame, checks if the data type is ‘object’, and applies the strip
function to remove leading and trailing whitespaces. Finally, the function is called on the DataFrame ‘df’, and the modified DataFrame is printed.
Python3
# Importing required libraries import pandas as pd # Creating DataFrame having 4 columns and but # the data is in unregularized way. df = pd.DataFrame({ 'Names' : [ ' Sunny' , 'Bunny' , 'Ginny ' , ' Binny ' , ' Chinni' , 'Minni' ], 'Age' : [ 23 , 44 , 23 , 54 , 22 , 11 ], 'Blood_Group' : [ ' A+' , ' B+' , 'O+' , 'O-' , ' A-' , 'B-' ], 'Gender' : [ ' M' , ' M' , 'F' , 'F' , 'F' , ' F' ] }) # Creating a function which will remove extra leading # and tailing whitespace from the data. # pass dataframe as a parameter here def whitespace_remover(dataframe): # iterating over the columns for i in dataframe.columns: # checking datatype of each columns if dataframe[i].dtype = = 'object' : # applying strip function on column dataframe[i] = dataframe[i]. map ( str .strip) else : # if condn. is False then it will do nothing. pass # applying whitespace_remover function on dataframe whitespace_remover(df) # printing dataframe print (df) |
Output
Pandas – Strip whitespace from Entire DataFrame
“We can have data without information, but we cannot have information without data.” How beautiful this quote is. Data is the backbone of a Data Scientist and according to a survey, data scientist spends approximately 60% of their time in Cleaning and Organizing Data, so it’s our responsibility to become familiar with different techniques to organize the data in a better way.
In this article, we will learn about different methods to remove the extra strip whitespace from the entire DataFrame. The dataset used here is given below: