merge() function from base R
The merge() function in base R is a powerful tool for combining two data frames by columns when they have one or more common variables (similar to SQL joins).
Synatx: merged_df <- merge(x = dataframe1, y = dataframe2, by = “common_key”, all = FALSE)
Where:
- x and y: These parameters give accounts for the input dataframes to be merged. x stands for the left dataframe and y indicates the right side of the dataframe.
- by: By setting this parameter, the user can control which column(s) or variable(s) in the input dataframes will act as key(s) to perform the merge operation.
- all: The presence or absence of this parameter is optional.
Let’s explane the inner join operation with a practical example:
# Create two example data frames
df1 <- data.frame(ID = c(1, 2, 3, 4),
Name = c("Johny", "Ali", "Boby", "Emilya"))
df1
df2 <- data.frame(ID = c(2, 3, 4, 5),
Age = c(30, 25, 40, 35),
Gender = c("Male", "Female", "Male", "Female"))
df2
Output:
ID Name
1 1 Johny
2 2 Ali
3 3 Boby
4 4 Emilya
ID Age Gender
1 2 30 Male
2 3 25 Female
3 4 40 Male
4 5 35 Female
Inner Join using merge()
Here we are going to preform inner join with single columns i.e ID.
# Perform an inner join on the "ID" column
merged_df <- merge(df1, df2, by = "ID", all = FALSE)
# Output the merged data frame
print(merged_df)
Output:
ID Name Age Gender
1 2 Ali 30 Male
2 3 Boby 25 Female
3 4 Emilya 40 Male
This will produce a data frame containing only the rows where the “ID” column matches in both data frames.
Inner Join on Multiple Columns
Here we are going to preform inner join with multiple columns i.e ID.
# Create two example data frames
df1 <- data.frame(ID = c(1, 2, 3, 4),
Name = c("Johny", "Ali", "Boby", "Emilya"))
df2 <- data.frame(ID = c(2, 3, 4, 5),
Name = c("Ali", "Boby", "Emilya",'Jhonathan'),
Age = c(30, 25, 40, 35),
Gender = c("Male", "Female", "Male", "Female"))
# Perform an inner join on the "ID" and "Name" columns
merged_df <- merge(df1, df2, by = c("ID", "Name"), all = FALSE)
# Output the merged data frame
print(merged_df)
Output:
ID Name Age Gender
1 2 Ali 30 Male
2 3 Boby 25 Female
3 4 Emilya 40 Male
This will merge the data frames based on both “ID” and “Name” columns. Adjust the all
parameter as needed to perform other types of joins.
How to Perform Inner Join in R
When working with multiple datasets in R, combining them based on common keys or variables is often necessary to derive meaningful insights. Inner join is one of the fundamental operations in data manipulation that allows you to merge datasets based on matching values. In this article, we will explore the inner join operation in R Programming Language.
Table of Content
- merge() function from base R
- Inner Join using merge()
- Inner Join on Multiple Columns
- Using dplyr to Perform Inner Join in R
- Inner Join on a Single Column
- Inner Join on Multiple Columns
- Conclusion
There are two main two types of methods available:
- merge() function from base R
- inner_join() function from dplyr
The inner join operation in R can be carried out employing either the merge() function which is base R’s or the inner_join() function from dplyr. Here’s a detailed explanation of the syntax for both approaches: