Find Duplicate Columns from a DataFrame
To find duplicate columns we need to iterate through all columns of a DataFrame and for each and every column it will search if any other column exists in DataFrame with the same contents already. If yes then that column name will be stored in the duplicate column set. In the end, the function will return the list of column names of the duplicate column. In this way, we can find duplicate labels in Pandas.
Python3
import pandas as pd def getDuplicateColumns(df): # Create an empty set duplicateColumnNames = set () # Iterate through all the columns of dataframe for x in range (df.shape[ 1 ]): # Take column at xth index. col = df.iloc[:, x] # Iterate through all the columns for y in range (x + 1 , df.shape[ 1 ]): # Take column at yth index. otherCol = df.iloc[:, y] # Check if two columns at x & y if col.equals(otherCol): duplicateColumnNames.add(df.columns.values[y]) return list (duplicateColumnNames) # Driver code if __name__ = = "__main__" : # List of Tuples students = [ ( 'Ankit' , 34 , 'Uttar pradesh' , 34 ), ( 'Riti' , 30 , 'Delhi' , 30 ), ( 'Aadi' , 16 , 'Delhi' , 16 ), ( 'Riti' , 30 , 'Delhi' , 30 ), ( 'Riti' , 30 , 'Delhi' , 30 ), ( 'Riti' , 30 , 'Mumbai' , 30 ), ( 'Ankita' , 40 , 'Bihar' , 40 ), ( 'Sachin' , 30 , 'Delhi' , 30 ) ] # Create a DataFrame object df = pd.DataFrame(students, columns = [ 'Name' , 'Age' , 'Domicile' , 'Marks' ]) # Get list of duplicate columns duplicateColNames = getDuplicateColumns(df) for column in duplicateColNames: print ( 'Column Name : ' , column) |
Output:
Column Name: Marks
How to Find & Drop duplicate columns in a Pandas DataFrame?
Let’s discuss How to Find and drop duplicate columns in a Pandas DataFrame. First, Let’s create a simple Dataframe with column names ‘Name’, ‘Age’, ‘Domicile’, and ‘Age’/’Marks’.