Remove the duplicate columns before merging two columns

In this method, the user needs to call the merge() function which will be simply joining the columns of the data frame and then further the user needs to call the difference() function to remove the identical columns from both data frames and retain the unique ones in the python language.

Difference function:

This function returns a set that contains the difference between two sets.

Syntax:

set.difference(set)

Parameters:

set :The set to check for differences in

Example:

In this example. we are using the difference function to remove the identical columns from given data frames and further store the dataframe with the unique column as a new dataframe. Now, use pd.merge() function to join the left dataframe with the unique column dataframe using ‘inner’ join. This will ensure that no columns are duplicated in the merged dataset.

Python3

# import python pandas package
import pandas as pd
 
# import the numpy package
import numpy as np
 
# Create sample dataframe data1 and data2
data1 = pd.DataFrame(np.random.randint(100, size=(1000, 3)),
                     columns=['EMI', 'Salary', 'Debt'])
data2 = pd.DataFrame(np.random.randint(100, size=(1000, 3)),
                     columns=['Salary', 'Debt', 'Bonus'])
 
# Find the columns that aren't in the first DataFrame
different_cols = data2.columns.difference(data1.columns)
 
# Filter out the columns that are different.
# You could pass in the df2[diff_cols] 
# directly into the merge as well.
data3 = data2[different_cols]
 
# Merge the DataFrames
df_merged = pd.merge(data1, data3, left_index=True,
                     right_index=True, how='inner')

Output:

Prevent duplicated columns when joining two Pandas DataFrames

Column duplication usually occurs when the two data frames have columns with the same name and when the columns are not used in the JOIN statement. In this article, let us discuss the three different methods in which we can prevent duplication of columns when joining two data frames.

Syntax: pandas.merge(left, right, how=’inner’, on=None, left_on=None, right_on=None)

Explanation:

left – Dataframe which has to be joined from left

right – Dataframe which has to be joined from the right

how – specifies the type of join. left, right, outer, inner, cross

on – Column names to join the two dataframes.

left_on – Column names to join on in the left DataFrame.

right_on – Column names to join on in the right DataFrame.

Remove the duplicate columns before merging two columns

Python3

Prevent duplicated columns when joining two Pandas DataFrames

Categories

Contact US

Remove the duplicate columns before merging two columns

Python3

Prevent duplicated columns when joining two Pandas DataFrames

Similar Reads

Categories

Contact US