Memorization Capacity of Multi-Head Attention in Transformers

Publication
ArXiv Preprint