Example of Positional Encoding
Let’s consider a simple example to illustrate the concept of positional encoding in the context of a Transformer model.
Suppose we have a Transformer model tasked with translating English sentences into French. One of the sentences in English is:
"The cat sat on the mat."
Before the sentence is fed into the Transformer model, it undergoes tokenization, where each word is converted into a token. Let’s assume the tokens for this sentence are:
["The", "cat" , "sat", "on", "the" ,"mat"]
Next, each token is mapped to a high-dimensional vector representation through an embedding layer. These embeddings encode semantic information about the words in the sentence. However, they lack information about the order of the words.
Embeddings={E1,E2,E3,E4,E5,E6}
where each Ei is a 4-dimensional vector.
This is where positional encoding comes into play. To ensure that the model understands the order of the words in the sequence, positional encodings are added to the word embeddings. These encodings provide each token with a unique positional representation.
Calculating Positional Encodings
- Let’s say the embedding dimensionality is 4 for simplicity.
- We’ll use sine and cosine functions to generate positional encodings. Consider the following positional encodings for the tokens in our example sentence:
[Tex]\newline \text{PE}(1)=[\sin(\frac{1}{10000^{2 \times 0/4}}), \cos(\frac{1}{10000^{2 \times 0/4}}), \sin(\frac{1}{10000^{2 \times 1/4}}), \cos(\frac{1}{10000^{2 \times 1/4}})] \newline \text{PE}(2)=[\sin(\frac{2}{10000^{2 \times 0/4}}), \cos(\frac{2}{10000^{2 \times 0/4}}), \sin(\frac{2}{10000^{2 \times 1/4}}), \cos(\frac{2}{10000^{2 \times 1/4}})] \newline \text{PE}(3)=[\sin(\frac{3}{10000^{2 \times 0/4}}), \cos(\frac{3}{10000^{2 \times 0/4}}), \sin(\frac{3}{10000^{2 \times 1/4}}), \cos(\frac{3}{10000^{2 \times 1/4}})] \newline \text{PE}(4)=[\sin(\frac{4}{10000^{2 \times 0/4}}), \cos(\frac{4}{10000^{2 \times 0/4}}), \sin(\frac{4}{10000^{2 \times 1/4}}), \cos(\frac{4}{10000^{2 \times 1/4}})] \newline \text{PE}(5)=[\sin(\frac{5}{10000^{2 \times 0/4}}), \cos(\frac{5}{10000^{2 \times 0/4}}), \sin(\frac{5}{10000^{2 \times 1/4}}), \cos(\frac{5}{10000^{2 \times 1/4}})] \newline \text{PE}(6)=[\sin(\frac{6}{10000^{2 \times 0/4}}), \cos(\frac{6}{10000^{2 \times 0/4}}), \sin(\frac{6}{10000^{2 \times 1/4}}), \cos(\frac{6}{10000^{2 \times 1/4}})] [/Tex]
- These positional encodings are added element-wise to the word embeddings. The resulting vectors contain both semantic and positional information, allowing the Transformer model to understand not only the meaning of each word but also its position in the sequence.
- This example illustrates how positional encoding ensures that the Transformer model can effectively process and understand input sequences by incorporating information about the order of the tokens.
Positional Encoding in Transformers
In the domain of natural language processing (NLP), transformer models have fundamentally reshaped our approach to sequence-to-sequence tasks. .However, unlike conventional recurrent neural networks (RNNs) or convolutional neural networks (CNNs), Transformers lack inherent awareness of token order. In this article, we will understand the significance of positional encoding, which is a critical technique for embedding Transformer models with an understanding of sequence order.