Removing duplicate rows based on the Single Column

distinct() function can be used to filter out the duplicate rows. We just have to pass our R object and the column name as an argument in the distinct() function. 

Note: We have used this parameter “.keep_all= TRUE” in the function because by default its FALSE, and it will print only the distinct values of the specified column, but we want all the columns so we have to make it TRUE, such that it will print all the other columns along with the current column. 

Syntax: distinct(df, column_name, .keep_all= TRUE)

Parameters:

df: dataframe object

column_name: column name based on which duplicate rows will be removed

Example: R program to remove duplicate rows based on single column

R




library(dplyr)
  
df <- data.frame (lang =c ('Java','C','Python','GO','RUST','Javascript',
                      'Cpp','Java','Julia','Typescript','Python','GO'),
  
                      value = c (21,21,3,5,180,9,12,21,6,0,3,6),
  
                      usage =c(21,21,0,99,44,48,53,21,6,8,0,6))
  
distinct(df, lang, .keep_all= TRUE)


Output:

lang value usage
1       Java    21    21
2          C    21    21
3     Python     3     0
4         GO     5    99
5       RUST   180    44
6 Javascript     9    48
7        Cpp    12    53
8      Julia     6     6
9 Typescript     0     8

Remove duplicate rows based on multiple columns using Dplyr in R

In this article, we will learn how to remove duplicate rows based on multiple columns using dplyr in R programming language.

Dataframe in use:

            lang value usage
1        Java    21    21
2           C    21    21
3      Python     3     0
4          GO     5    99
5        RUST   180    44
6  Javascript     9    48
7         Cpp    12    53
8        Java    21    21
9       Julia     6     6
10 Typescript     0     8
11     Python     3     0
12         GO     6     6

Similar Reads

Removing duplicate rows based on the Single Column

distinct() function can be used to filter out the duplicate rows. We just have to pass our R object and the column name as an argument in the distinct() function....

Removing duplicate rows based on Multiple columns

...

Remove all the duplicate rows from the dataframe

We can remove duplicate values on the basis of ‘value‘ & ‘usage‘ columns, bypassing those column names as an argument in the distinct function....

Using duplicated() function

...