Memorization Capacity of Multi-Head Attention in Transformers

Publication
ICLR 2024, International Conference on Learning Representations. (Spotlight)