Transformer Attention Mechanisms
This cluster centers on discussions referencing the 'Attention is All You Need' paper, Transformer architecture, attention mechanisms, and their applications in LLMs and vision models, often questioning similarities or extensions to new research.
Activity Over Time
Top Contributors
Keywords
Sample Comments
You forgot attention mechanisms, that's a huge one
Can you elaborate on how Transformers are "convolutions with attention"?
Just asking, this seems very similar to the attention algorithm that powers LLMs?
The very idea of the Transformer architecture. Surely you've heard of "Attention is all you need".
I think the parent comment is referring to "Attention is All You Need", famous transformer paper.
Isn't that how previous models were, before the attention is all you need paper?
Didn't "Attention Is All you Need" bill transformers primarily as a translation model?
Does this apply to non-transformer based architectures as well?
Why do you call your language model “transformer”?
I thought you were asking about attention only transformers. This paper touches on some of it https://arxiv.org/abs/2212.10559v2.