x86 Decoding Complexity

Discussions center on the challenges of decoding variable-length x86 instructions in modern CPUs, including micro-op caches, decoder area costs, and comparisons to fixed-width ISAs like ARM, RISC-V, and VLIW alternatives.

📉 Falling 0.4x Hardware
5,403
Comments
19
Years Active
5
Top Authors
#7800
Topic ID

Activity Over Time

2008
9
2009
24
2010
33
2011
50
2012
66
2013
134
2014
197
2015
227
2016
292
2017
349
2018
413
2019
354
2020
678
2021
540
2022
511
2023
629
2024
468
2025
414
2026
17

Keywords

RAM e.g CPU MIPS A12 ROB FP P6 docs.boom core.org instructions decode instruction x86 intel instruction set cache decoding cpu isa

Sample Comments

smoldesu Sep 5, 2022 View on HN

Hasn't Intel been shipping scalar instructions for more than a decade?

fooblaster Jun 18, 2023 View on HN

I'm not sure how you could write something like this without considering something like the micro op cache, which is present in all modern x86 and some arm processors. The micro op cache on x86 is effectively is the only way an x86 processor can get full ipc performance, and that's because it contains pre decoded instructions. We don't know the formats here, but we can guarantee that they are fixed length instructions and that they have branch instructions annotated. Yeah sure, th

iso-8859-1 Jan 4, 2018 View on HN

Is the Mill architecture solving problems orthogonal to this?

zozbot234 Feb 22, 2023 View on HN

Not if you have a complex ISA like x86 and want a very wide decode.

kersplody Jul 4, 2022 View on HN

Not really an issue under most usage scenarios. x86 suffers mostly from the cost and complexity of the decode unit. Once decoded instructions are cached, the performance penalty mostly disappears.

makapuf May 31, 2019 View on HN

Can't argue with 1. For 2 it's still an instruction set separate from uops so you might not be as sensitive to changes you still get a level of indirection. For 3 .. it depends. You might gain in power if you use the newer one more and you might as well make the older one simpler to achieve 90% of speed maybe. But maybe the decoding of instructions that counts is not that expensive compared to OOO branch predictors and 512 bits ALUs

stephencanon Jun 7, 2023 View on HN

Most SIMD and FP instructions are not microcoded in a modern mainstream CPU, FWIW.

sargun Jan 4, 2018 View on HN

Certain Atom CPUs have no speculative execution nor OoO execution.

chroma Dec 11, 2018 View on HN

I don't think that's an advantage these days. The bottleneck seems to be decoding instructions, and that's easier to parallelize if instructions are fixed width. Case in point: The big cores on Apple's A11 and A12 SoCs can decode 7 instructions per cycle. Intel's Skylake can do 5. Intel CPUs also have μop caches because decoding x86 is so expensive.

zelos May 1, 2018 View on HN

Games console CPUs support those kind of instructions, don't they?To some extent, didn't Intel go down this road with VLIW: trying to shift the burden of making code fast onto the compiler, instead of the CPU?