Transformer Attention Mechanisms

This cluster centers on discussions referencing the 'Attention is All You Need' paper, Transformer architecture, attention mechanisms, and their applications in LLMs and vision models, often questioning similarities or extensions to new research.

➡️ Stable 0.7x AI & Machine Learning

4,437

Comments

Years Active

Top Authors

#3106

Topic ID

Activity Over Time

2015

2016

2017

2018

2019

141

2020

174

2021

131

2022

213

2023

1,374

2024

1,153

2025

1,056

2026

Top Contributors

visarga (53) cs702 (45) HarHarVeryFunny (45) whimsicalism (30) famouswaffles (28)

Keywords

LLM GWO C.f HN RGB MLA medium.com paperswithcode.com papers.nips FFHQ256 transformer attention transformers layer token paper models tokens architecture sequence

Sample Comments

deepnotderp • May 7, 2017 • View on HN

You forgot attention mechanisms, that's a huge one

phowon • Feb 3, 2019 • View on HN

Can you elaborate on how Transformers are "convolutions with attention"?

spyckie2 • Jul 12, 2023 • View on HN

Just asking, this seems very similar to the attention algorithm that powers LLMs?

RevEng • Dec 28, 2024 • View on HN

The very idea of the Transformer architecture. Surely you've heard of "Attention is all you need".

mks_shuffle • Jul 31, 2025 • View on HN

I think the parent comment is referring to "Attention is All You Need", famous transformer paper.

stri8ed • Apr 11, 2024 • View on HN

Isn't that how previous models were, before the attention is all you need paper?

herculity275 • Apr 23, 2024 • View on HN

Didn't "Attention Is All you Need" bill transformers primarily as a translation model?

primordialsoup • Mar 3, 2022 • View on HN

Does this apply to non-transformer based architectures as well?

p1esk • Sep 1, 2024 • View on HN

Why do you call your language model “transformer”?

adamnemecek • Feb 11, 2023 • View on HN

I thought you were asking about attention only transformers. This paper touches on some of it https://arxiv.org/abs/2212.10559v2.