Find Duplicate Columns from a DataFrame

To find duplicate columns we need to iterate through all columns of a DataFrame and for each and every column it will search if any other column exists in DataFrame with the same contents already. If yes then that column name will be stored in the duplicate column set. In the end, the function will return the list of column names of the duplicate column. In this way, we can find duplicate labels in Pandas.

Python3




import pandas as pd
 
def getDuplicateColumns(df):
 
    # Create an empty set
    duplicateColumnNames = set()
 
    # Iterate through all the columns of dataframe
    for x in range(df.shape[1]):
 
        # Take column at xth index.
        col = df.iloc[:, x]
 
        # Iterate through all the columns
        for y in range(x + 1, df.shape[1]):
 
            # Take column at yth index.
            otherCol = df.iloc[:, y]
 
            # Check if two columns at x & y
            if col.equals(otherCol):
                duplicateColumnNames.add(df.columns.values[y])
 
    return list(duplicateColumnNames)
 
 
# Driver code
if __name__ == "__main__":
 
    # List of Tuples
    students = [
        ('Ankit', 34, 'Uttar pradesh', 34),
        ('Riti', 30, 'Delhi', 30),
        ('Aadi', 16, 'Delhi', 16),
        ('Riti', 30, 'Delhi', 30),
        ('Riti', 30, 'Delhi', 30),
        ('Riti', 30, 'Mumbai', 30),
        ('Ankita', 40, 'Bihar', 40),
        ('Sachin', 30, 'Delhi', 30)
    ]
 
    # Create a DataFrame object
    df = pd.DataFrame(students, columns=['Name', 'Age',
                                     'Domicile', 'Marks'])
 
    # Get list of duplicate columns
    duplicateColNames = getDuplicateColumns(df)
 
    for column in duplicateColNames:
        print('Column Name : ', column)


Output:

Column Name:  Marks

How to Find & Drop duplicate columns in a Pandas DataFrame?

Let’s discuss How to Find and drop duplicate columns in a Pandas DataFrame. First, Let’s create a simple Dataframe with column names ‘Name’, ‘Age’, ‘Domicile’, and ‘Age’/’Marks’. 

Similar Reads

Find Duplicate Columns from a DataFrame

To find duplicate columns we need to iterate through all columns of a DataFrame and for each and every column it will search if any other column exists in DataFrame with the same contents already. If yes then that column name will be stored in the duplicate column set. In the end, the function will return the list of column names of the duplicate column. In this way, we can find duplicate labels in Pandas....

Remove Duplicate Columns from a DataFrame

...