In this paper authors from UCSB and Microsoft Research propose the LONGMEM framework, which enables language models to cache long-form prior context or knowledge into the non-differentiable memory bank and take advantage of them via a decoupled memory module to address the memory staleness problem. They create a revolutionary residual side network (SideNet) to achieve decoupled memory. A frozen backbone LLM is used to extract the paired attention keys and values from the previous context into the memory bank. The resulting attention query of the current input is utilized in the SideNet’s memory-augmented layer to access cached (keys and values) for earlier contexts. The associated memory augmentations are then fused into learning hidden states via a joint attention process.

Paper:

Augmenting Language Models with Long-Term Memory