x86 Decoding Complexity

Discussions center on the challenges of decoding variable-length x86 instructions in modern CPUs, including micro-op caches, decoder area costs, and comparisons to fixed-width ISAs like ARM, RISC-V, and VLIW alternatives.

📉 Falling 0.4x Hardware

5,403

Comments

Years Active

Top Authors

#7800

Topic ID

Activity Over Time

2008

2009

2010

2011

2012

2013

134

2014

197

2015

227

2016

292

2017

349

2018

413

2019

354

2020

678

2021

540

2022

511

2023

629

2024

468

2025

414

2026

Top Contributors

monocasa (98) Symmetry (78) gpderetta (68) mhh__ (57) userbinator (54)

Keywords

RAM e.g CPU MIPS A12 ROB FP P6 docs.boom core.org instructions decode instruction x86 intel instruction set cache decoding cpu isa

Sample Comments

smoldesu • Sep 5, 2022 • View on HN

Hasn't Intel been shipping scalar instructions for more than a decade?

fooblaster • Jun 18, 2023 • View on HN

I'm not sure how you could write something like this without considering something like the micro op cache, which is present in all modern x86 and some arm processors. The micro op cache on x86 is effectively is the only way an x86 processor can get full ipc performance, and that's because it contains pre decoded instructions. We don't know the formats here, but we can guarantee that they are fixed length instructions and that they have branch instructions annotated. Yeah sure, th

iso-8859-1 • Jan 4, 2018 • View on HN

Is the Mill architecture solving problems orthogonal to this?

zozbot234 • Feb 22, 2023 • View on HN

Not if you have a complex ISA like x86 and want a very wide decode.

kersplody • Jul 4, 2022 • View on HN

Not really an issue under most usage scenarios. x86 suffers mostly from the cost and complexity of the decode unit. Once decoded instructions are cached, the performance penalty mostly disappears.

makapuf • May 31, 2019 • View on HN

Can't argue with 1. For 2 it's still an instruction set separate from uops so you might not be as sensitive to changes you still get a level of indirection. For 3 .. it depends. You might gain in power if you use the newer one more and you might as well make the older one simpler to achieve 90% of speed maybe. But maybe the decoding of instructions that counts is not that expensive compared to OOO branch predictors and 512 bits ALUs

stephencanon • Jun 7, 2023 • View on HN

Most SIMD and FP instructions are not microcoded in a modern mainstream CPU, FWIW.

sargun • Jan 4, 2018 • View on HN

Certain Atom CPUs have no speculative execution nor OoO execution.

chroma • Dec 11, 2018 • View on HN

I don't think that's an advantage these days. The bottleneck seems to be decoding instructions, and that's easier to parallelize if instructions are fixed width. Case in point: The big cores on Apple's A11 and A12 SoCs can decode 7 instructions per cycle. Intel's Skylake can do 5. Intel CPUs also have μop caches because decoding x86 is so expensive.

zelos • May 1, 2018 • View on HN

Games console CPUs support those kind of instructions, don't they?To some extent, didn't Intel go down this road with VLIW: trying to shift the burden of making code fast onto the compiler, instead of the CPU?