Word Embedding
To understand semantic relationships between sentences one must be aware of the word embeddings. Word embeddings are used for vectorized representation of words. The simplest form of word embedding is a one-hot vector. However, these are sparse, very high dimensional, and do not capture meaning. The more advanced form consists of the Word2Vec (skip-gram, cbow), GloVe, and Fasttext which capture semantic information in low dimensional space. Kindly look at the embedded link to get a deeper understanding of this.
Different Techniques for Sentence Semantic Similarity in NLP
Semantic similarity is the similarity between two words or two sentences/phrase/text. It measures how close or how different the two pieces of word or text are in terms of their meaning and context.
In this article, we will focus on how the semantic similarity between two sentences is derived. We will cover the following most used models.
- Dov2Vec – An extension of word2vec
- SBERT – Transformer-based model in which the encoder part captures the meaning of words in a sentence.
- InferSent -It uses bi-directional LSTM to encode sentences and infer semantics.
- USE (universal sentence encoder) – It’s a model trained by Google that generates fixed-size embeddings for sentences that can be used for any NLP task.