How to use dplyr to Perform Inner Join in R In R Language
By using dplyr package we perform Inner join in R. Here are the basic syntax for Perform Inner Join in R.
Syntax: merged_df <- inner_join(dataframe1, dataframe2, by = “common_key”)
Where,
- dataframe1 and dataframe2: These parameters specify the input dataframes to be merged.
- by: This parameter explains the column(s) or variable(s) in the dataframes being passed in as the keys for merging.
Inner Join on a Single Column
# Load the dplyr package
library(dplyr)
# Create two example data frames
df1 <- data.frame(ID = c(1, 2, 3, 4),
Name = c("John", "Alice", "Bob", "Emily"))
df1
df2 <- data.frame(ID = c(2, 3, 4, 5),
Age = c(30, 25, 40, 35),
Gender = c("Male", "Female", "Male", "Female"))
df2
# Perform a simple inner join on the "ID" column using dplyr
merged_df_simple <- inner_join(df1, df2, by = "ID")
# Output the merged data frame
print(merged_df_simple)
Output:
ID Name
1 1 John
2 2 Alice
3 3 Bob
4 4 Emily
ID Age Gender
1 2 30 Male
2 3 25 Female
3 4 40 Male
4 5 35 Female
ID Name Age Gender
1 2 Alice 30 Male
2 3 Bob 25 Female
3 4 Emily 40 Male
In this we join the dataframes by ID column.
Inner Join on Multiple Columns
# Load the dplyr package
library(dplyr)
# Create two example data frames with overlapping and non-overlapping columns
df1 <- data.frame(ID = c(1, 2, 3, 4),
Name = c("John", "Alice", "Bob", "Emily"),
Age = c(30, 25, 40, 35))
df1
df2 <- data.frame(ID = c(2, 3, 4, 5),
Name = c("Alice", "Bob", "Emily", "Eve"),
Age = c(25, 40, 35, 45), # Adjusted Age values for matches
Gender = c("Male", "Female", "Male", "Female"))
df2
# Perform an inner join on multiple columns using dplyr
merged_df_multiple <- inner_join(df1, df2, by = c("ID", "Name", "Age"))
# Output the merged data frame
print(merged_df_multiple)
Output:
ID Name Age
1 1 John 30
2 2 Alice 25
3 3 Bob 40
4 4 Emily 35
ID Name Age Gender
1 2 Alice 25 Male
2 3 Bob 40 Female
3 4 Emily 35 Male
4 5 Eve 45 Female
ID Name Age Gender
1 2 Alice 25 Male
2 3 Bob 40 Female
3 4 Emily 35 Male
In this we performed inner join on the “ID”, “Name”, and “Age” columns simultaneously. Adjust the by
parameter accordingly to match the columns you want to join on.
How to Perform Inner Join in R
When working with multiple datasets in R, combining them based on common keys or variables is often necessary to derive meaningful insights. Inner join is one of the fundamental operations in data manipulation that allows you to merge datasets based on matching values. In this article, we will explore the inner join operation in R Programming Language.
Table of Content
- merge() function from base R
- Inner Join using merge()
- Inner Join on Multiple Columns
- Using dplyr to Perform Inner Join in R
- Inner Join on a Single Column
- Inner Join on Multiple Columns
- Conclusion
There are two main two types of methods available:
- merge() function from base R
- inner_join() function from dplyr
The inner join operation in R can be carried out employing either the merge() function which is base R’s or the inner_join() function from dplyr. Here’s a detailed explanation of the syntax for both approaches: