Utilize Pandas Dataframe for data wrangling

Creating Dataframe using CSV

In this example, we will use a CSV file to print top n (5 by default) rows of a DataFrame or series using the Pandas.head() method.

Python3

# importing packages 
import pandas as pd 
  
# loading csv data 
population_data = pd.read_csv('employees.csv') 
  
# setting the index of the dataframe 
population_data=population_data.set_index('First Name') 
  
# head of the dataframe 
population_data.head()

Output:

Describing DataFrame

pd.Describe() method is used to get the summary statistics of the Dataframe.

Python3

# importing packages 
import pandas as pd 
  
# loading csv data 
population_data = pd.read_csv('employees.csv') 
  
population_data.describe()

Output:

Setting and Resetting the index of the Dataframe

pd.set_index is used for setting and resetting the index of the Dataframe. Whereas, pd.reset_index() reverts the Dataframe back to the normal state. Here, the name of the column is given as an argument.

Example 1: Resetting the index of the Dataframe in Start Date columns

Python3

# importing packages 
import pandas as pd 
  
# creating a pandas Dataframe 
population_data = pd.read_csv('employees.csv') 
  
# setting the index of the dataframe 
population_data = population_data.set_index('Start Date') 
pd.DataFrame(population_data) 

Output:

Example 2: Resetting the index of the Dataframe in First Name columns

Python3

# importing packages 
import pandas as pd 
  
# creating a pandas Dataframe 
population_data = pd.read_csv('employees.csv') 
  
  
# resetting the index of the dataframe 
population_data = population_data.reset_index() 
  
population_data = population_data.set_index('First Name') 
  
pd.DataFrame(population_data) 

Output:

Deleting a column from the DataFrames

The column ‘Salary’ is deleted from the DataFrames from our CSV file.

Python3

# importing packages 
import pandas as pd 
  
# loading csv data 
population_data = pd.read_csv('employees.csv') 
  
  
# deleting column 
del population_data['Salary'] 
                      
pd.DataFrame(population_data.head())

Output:

Reshaping dataframe

df.Transpose() function is used to find the transpose of the given DataFrame.

Python3

# importing packages 
import pandas as pd 
  
# loading csv data 
population_data = pd.read_csv('employees.csv') 
  
# setting the index of the dataframe 
population_data=population_data.set_index('First Name') 
  
# displaying a transpose of the dataframe 
pd.DataFrame(population_data.transpose().head()) 

Output:

Sorting the Dataframe

df.sort_values() function is used to sort data. In this, the column name is passed as a parameter.

Python3

# importing packages 
import pandas as pd 
  
# loading csv data 
population_data = pd.read_csv('employees.csv') 
  
# setting the index of the dataframe 
population_data=population_data.set_index('First Name') 
  
# sorting the Dataframe based on Density of population per km column 
sorted_dataframe = population_data.sort_values('Salary', ascending=False) 
pd.DataFrame(sorted_dataframe)

Output:

Dealing with missing values

Missing or null values can be checked with the Pandas df.null() method.

Python3

# importing packages 
import pandas as pd 
  
# loading csv data 
data = pd.read_csv('employees.csv') 
  
# checking for null values 
data.isnull().sum()

Output:

Dropping Rows

We can filter rows that have null values by using df.dropna() method.

Python3

# importing packages 
import pandas as pd 
  
# loading csv data 
data = pd.read_csv('employees.csv') 
  
# dropping NA values 
data = data.dropna(axis=0, how='any') 
  
# checking for null values 
data.isnull().sum() 

Output:

Grouping Data

In Data Analysis, the Grouping of data sets is a common requirement when the outcome must be expressed in terms of many groups. Panadas provides us with a built-in mechanism for grouping data into several categories. The pandas‘ df.groupby() technique is used for grouping data.

In the below code, We will create a DataFrames of students and their grades. In this groupby() method is used to group students according to their grades with their names.

Python3

# importing pandas as pd 
import pandas as pd 
  
# Creating the dataframe 
df = pd.read_csv("employees.csv") 
  
# First grouping based on "Team" 
# Within each team we are grouping based on "Position" 
data = df.groupby(['First Name', 'Gender']) 
  
# Print the first value in each group 
data.first() 

Output:

Merging Dataframe

Pandas df.merge() method is used to merge two DataFrames. There are different ways of merging DataFrames like, outer join, inner join, left join, right join, etc.

Python3

import pandas as pd 
  
# reading two csv files 
data = pd.read_csv('employees.csv') 
  
# creating two dataframe  
head_data = data.head() 
tail_data = data.tail() 
  
# get top 5 rows 
print("Head Data :") 
display(head_data) 
  
# get last 5 rows 
print("Tail Data :") 
display(tail_data) 
  
# merge dataframe 
merge_data = pd.merge(head_data, tail_data, how='outer' ) 
  
print("After merging: ") 
display(merge_data)

Output:

Concatenating Data

The Concat function is used to conduct concatenation operations along an axis. Let’s create two DataFrames and concatenate them.

Python3

import pandas as pd 
  
# reading two csv files 
data1 = pd.read_csv('employees.csv') 
data2 = pd.read_csv('borrower.csv') 
  
# concatenating the dataframes 
pd.DataFrame(pd.concat([data1,data2]))

Output:

How to utilise Pandas dataframe and series for data wrangling?

In this article, we are going to see how to utilize Pandas DataFrame and series for data wrangling.

The process of cleansing and integrating dirty and complicated data sets for easy access and analysis is known as data wrangling. As the amount of data raises continually and expands, it is becoming more important to organize vast amounts of data for analysis. Data wrangling comprises activities such as data sorting, data filtering, data reduction, data access, and data processing. Data wrangling is one of the most important tasks in data science and data analysis. Let’s see how to utilize Pandas DataFrame and series for data wrangling.

Utilize Pandas Dataframe for data wrangling

Creating Dataframe using CSV

Python3

Describing DataFrame

Python3

Setting and Resetting the index of the Dataframe

Python3

Python3

Deleting a column from the DataFrames

Python3

Reshaping dataframe

Python3

Sorting the Dataframe

Python3

Dealing with missing values

Python3

Dropping Rows

Python3

Grouping Data

Python3

Merging Dataframe

Python3

Concatenating Data

Python3

How to utilise Pandas dataframe and series for data wrangling?

Categories

Contact US

Utilize Pandas Dataframe for data wrangling

Python3

Describing DataFrame

Python3

Setting and Resetting the index of the Dataframe

Python3

Python3

Deleting a column from the DataFrames

Python3

Python3

Python3

Python3

Python3

Python3

Python3

Python3

How to utilise Pandas dataframe and series for data wrangling?

Similar Reads

Categories

Contact US