GroupBy in Python Pandas
A groupby operation in Pandas helps us to split the object by applying a function and there-after combine the results.
After grouping the columns according to our choice, we can perform various operations which can eventually help us in the analysis of the data.
Syntax
DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=<object object>, observed=False, dropna=True)
Parameter
- by: It helps us to group by specific or multiple columns in the DataFrame.
- axis: It has a default value of 0 where 0 stands for index and 1 stands for columns.
- level: Let us consider that the DataFrame we are working with has hierarchical indexing. In that case, level helps us to determine the level of the index we are working with.
- as_index: It is a boolean data type with a default value of true. It returns an object with group labels as the index.
- sort: It helps us to sort the key values. It is preferable to keep it as false for better performance.
- group_keys: It is also a boolean value with a default value of true. It adds group keys to indexes to identify pieces
- dropna: It helps to drop the ‘NA‘ values in a dataset
Example 1: Using Groupby with DataFrame
First, let’s create a DataFrame on which we will perform the groupby operation.
# importing pandas library
import numpy as np
# Creating pandas dataframe
df = pd.DataFrame(
[
("Corona Positive", 65, 99),
("Corona Negative", 52, 98.7),
("Corona Positive", 43, 100.1),
("Corona Positive", 26, 99.6),
("Corona Negative", 30, 98.1),
],
index=["Patient 1", "Patient 2", "Patient 3",
"Patient 4", "Patient 5"],
columns=("Status", "Age(in Years)", "Temperature"),
)
# show dataframe
print(df)
Output:
Now let us group them according to some features:
# Grouping with only status
grouped1 = df.groupby("Status")
# Grouping with temperature and status
grouped3 = df.groupby(["Temperature", "Status"])
As we can see, we have grouped them according to ‘Status‘ and ‘Temperature and Status‘. Let us perform some functions now:
Example: Finding the mean of a Group
This will create the mean of the numerical values according to the ‘status’.
# Finding the mean of the
# patients reports according to
# the status
grouped1.mean()
Pandas – Multi-index and Groupby Tutorial
Multi-index and Groupby are very important concepts of data manipulation. Multi-index allows you to represent data with multi-levels of indexing, creating a hierarchy in rows and columns.
Groupby lets you create groups of similar data and apply aggregate functions (e.g., mean, sum, count, standard deviation) to each group, condensing large datasets into meaningful summaries.
Using both these tools together allows you to analyze data from a different aspect.
In this article, we will discuss Multi-index for Pandas Dataframe and Groupby operations.