Dilated and Global Sliding Window Attention -FAQs
Q. What is Attention Mechanism?
Attention mechanism allows a model to selectively focus on specific parts of the input sequence or image. The components of attention mechanism comprise of query, key-value pairs and attention scores to weigh and process information.
Q. What is Dilated Attention?
Dilated attention involves the use of dilated convolutions to expand the receptive field without increasing parameters. In the context of attention mechanisms, it influences how the model attends to different parts of input sequences or images.
Q. What is sliding window attention?
Sliding Window Attention is a distinct attention mechanism applied in natural language processing scenarios involving sequential input, such as word sequences. The mechanism involves partitioning the input sequence into overlapping segments or “windows.” Subsequently, attention scores are computed independently for each window, signifying the model’s emphasis on different windows during the prediction process.
Dilated and Global Sliding Window Attention
“Dilated” and “Global Sliding Window” attentions are adaptations of attention mechanisms applied in neural networks, specifically in the domains of natural language processing and computer vision.
Prerequisites: Attention Mechanism | ML, Sliding Window Attention, Dilated CNN
A transformer-based model, such as BERT, SpanBERT, etc., has been utilized to carry out numerous Natural Language Processing tasks. These models’ self-attention mechanism Longformerlimits their potential. These models frequently fail to recognize and comprehend data that contains lengthy texts. In the late 2020s, a Longformer (Long-Document Transformer) entered the scene to provide this function. Long-sequenced strings can pose problems that Longformer seeks to resolve when they are longer than 512 tokens. It modified a CNN-like architecture called Sliding Window Attention to achieve this. Sliding window attention efficiently covers lengthy input data texts. It introduces a combination of sparse attention and sliding window approaches to efficiently manage long sequences.