Levenshtein distance between two string columns of a dataframe
To calculate Levenshtein distance between two string columns of a data frame in the R Language, we use the stringdist() function of the stringdist package library. The stringdist() function takes two string columns of a data frame as arguments and returns a vector that contains the Levenshtein distance between them.
Syntax: stringdist( string_data$column1, string_data$column2, method=”lv” )
Parameters:
- string_data: determines the data frame containing string columns.
- column1 and column2: determine the string columns of data frame whose Levenshtein distance is to be calculated.
Example: Here, we will calculate the Levenshtein distance between two string columns of a data frame.
R
# load library stringdist library (stringdist) # sample string data frame string_data<- data.frame (one= c ( "Priyank" , "Abhiraj" , "Sudhanshu" ), two= c ( "w3wiki" , "Devraj" , "Pawan" )) # calculate Levenshtein Distance string_data$levenshtein<- stringdist (string_data$one, string_data$two, method = 'lv' ) # print data frame string_data |
Output:
How to Calculate Levenshtein Distance in R?
In this article, we will discuss how to calculate Levenshtein Distance in the R Programming Language.
The Levenshtein distance between two strings is the minimum number of character substitutions, insertions, and deletions required to turn one string into the other string. The Levenshtein distance practically is used in approximate string matching, spell-checking, natural language processing, etc.
To calculate the Levenshtein distance in the R Language, we use the stringdist() function of the stringdist package library. The stringdist package is an R Language library that contains approximate String Matching, Fuzzy Text Search, and String Distance functions. The stringdist() function computes pairwise string distances between two or more strings, vectors, or data frame columns.