Transformer Attention Mechanisms

This cluster centers on discussions referencing the 'Attention is All You Need' paper, Transformer architecture, attention mechanisms, and their applications in LLMs and vision models, often questioning similarities or extensions to new research.

➡️ Stable 0.7x AI & Machine Learning
4,437
Comments
12
Years Active
5
Top Authors
#3106
Topic ID

Activity Over Time

2015
6
2016
22
2017
35
2018
48
2019
141
2020
174
2021
131
2022
213
2023
1,374
2024
1,153
2025
1,056
2026
84

Keywords

LLM GWO C.f HN RGB MLA medium.com paperswithcode.com papers.nips FFHQ256 transformer attention transformers layer token paper models tokens architecture sequence

Sample Comments

deepnotderp May 7, 2017 View on HN

You forgot attention mechanisms, that's a huge one

phowon Feb 3, 2019 View on HN

Can you elaborate on how Transformers are "convolutions with attention"?

spyckie2 Jul 12, 2023 View on HN

Just asking, this seems very similar to the attention algorithm that powers LLMs?

RevEng Dec 28, 2024 View on HN

The very idea of the Transformer architecture. Surely you've heard of "Attention is all you need".

mks_shuffle Jul 31, 2025 View on HN

I think the parent comment is referring to "Attention is All You Need", famous transformer paper.

stri8ed Apr 11, 2024 View on HN

Isn't that how previous models were, before the attention is all you need paper?

herculity275 Apr 23, 2024 View on HN

Didn't "Attention Is All you Need" bill transformers primarily as a translation model?

primordialsoup Mar 3, 2022 View on HN

Does this apply to non-transformer based architectures as well?

p1esk Sep 1, 2024 View on HN

Why do you call your language model “transformer”?

adamnemecek Feb 11, 2023 View on HN

I thought you were asking about attention only transformers. This paper touches on some of it https://arxiv.org/abs/2212.10559v2.