"Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding" is a research paper that introduces a novel approach to improve the performance of language models when dealing with long contexts. This innovative method addresses the challenges associated with processing extended sequences of text, which is crucial for tasks requiring comprehensive understanding of large documents or conversations.
Plug-and-Play Positional Encoding
The core innovation of this research is the introduction of a plug-and-play positional encoding technique. This method allows language models to better utilize long contexts by modifying how they process and interpret the position of tokens within a sequence.
Middle-Out Processing
The paper proposes a "middle-out" approach to context processing. This strategy involves:
This approach contrasts with traditional left-to-right or right-to-left processing methods commonly used in language models.
Ms-PoE is built upon the transformer architecture but introduces a novel positional encoding scheme. The key components are:
The foundation of Ms-PoE is a modification of the Rotary Position Embedding (RoPE). RoPE encodes absolute positions using a rotation matrix:
$$ R_{\theta}(x) = [x_1 \cos(\theta) - x_2 \sin(\theta), x_1 \sin(\theta) + x_2 \cos(\theta)] $$
Where x is the token embedding and \theta is the position-dependent angle.