Context Rot
The effects
Context rot is the performance degradation that LLMs experience as the length of the input context grows.
Although leading models now advertise context windows of a million tokens or more, performance degrades well before that limit, so in practice you work with far fewer tokens than the window nominally allows. Past a certain threshold, hallucinations and errors become more frequent. Cost compounds the problem: every token is reprocessed on every turn, so a bloated context is slower and more expensive too.
The causes
Lost in the Middle
LLMs perform best when relevant information is at the beginning or end of the input, but when relevant context is in the middle of a long input, retrieval performance degrades considerably, even in models specifically designed for long contexts (Liu et al., 2023).
Distraction by Irrelevant Context
Adding irrelevant context that forces the model to perform an additional recovery step significantly degrades its ability to maintain reliable performance. In other words, it's not just how many tokens there are, but how much noise the model has to filter out (Shi et al., 2023).