Flattening hierarchical index in pandas dataframe using groupby
Whenever we use the groupby function on a single column with multiple aggregation functions we get multiple hierarchical indexes based on the aggregation type. In such cases, the hierarchical index has to be flattened at both levels.
Syntax: pandas.DataFrame.groupby(by=None, axis=0, level=None)
Parameter:
- by – mapping function that determines the groups in groupby function
- axis – 0 – splits along rows and 1 – splits along columns.
- level – if the axis is multi-indexed, groups at a specified level. (int)
Syntax: pandas.DataFrame.agg(func=None, axis=0)
Parameter:
- func – specifies the function to be used as aggregation function. (min, max, sum etc)
- axis – 0 – function applied to each column and 1- applied to each row.
Example
Import the python pandas package. Create a sample dataframe showing the car sales in two-quarters q1 and q2 as shown. Now use the pandas groupby function to group based on the sum and max of sales on quarter 1 and sum and min of sales 2. The grouped dataframe has multi-indexed columns stored in a list of tuples. Use a for loop to iterate through the list of tuples and join them as a single string. Append the joined strings in the flat_cols list. </li > <li > Now assign the flat_cols list to the column names of the multi-indexed grouped dataframe columns.
Python3
# group by cars based on # the sum and max of sales on quarter 1 # and sum grouped_data = data.groupby(by = "cars" ).agg( { "sale_q1 in Cr" : [ sum , max ], 'sale_q2 in Cr' : [ sum , min ]}) # create an empty list to save the # names of the flattened columns flat_cols = [] # iterate through this tuples and # join them as single string for i in grouped_data.columns: flat_cols.append(i[ 0 ] + '_' + i[ 1 ]) # now assign the list of flattened # columns to the grouped columns. grouped_data.columns = flat_cols # print the grouped data print (grouped_data) |
Output:
How to flatten a hierarchical index in Pandas DataFrame columns?
In this article, we are going to see the flatten a hierarchical index in Pandas DataFrame columns. Hierarchical Index usually occurs as a result of groupby() aggregation functions. Flatten hierarchical index in Pandas, the aggregated function used will appear in the hierarchical index of the resulting dataframe.