x86 Decoding Complexity
Discussions center on the challenges of decoding variable-length x86 instructions in modern CPUs, including micro-op caches, decoder area costs, and comparisons to fixed-width ISAs like ARM, RISC-V, and VLIW alternatives.
Activity Over Time
Top Contributors
Keywords
Sample Comments
Hasn't Intel been shipping scalar instructions for more than a decade?
I'm not sure how you could write something like this without considering something like the micro op cache, which is present in all modern x86 and some arm processors. The micro op cache on x86 is effectively is the only way an x86 processor can get full ipc performance, and that's because it contains pre decoded instructions. We don't know the formats here, but we can guarantee that they are fixed length instructions and that they have branch instructions annotated. Yeah sure, th
Is the Mill architecture solving problems orthogonal to this?
Not if you have a complex ISA like x86 and want a very wide decode.
Not really an issue under most usage scenarios. x86 suffers mostly from the cost and complexity of the decode unit. Once decoded instructions are cached, the performance penalty mostly disappears.
Can't argue with 1. For 2 it's still an instruction set separate from uops so you might not be as sensitive to changes you still get a level of indirection. For 3 .. it depends. You might gain in power if you use the newer one more and you might as well make the older one simpler to achieve 90% of speed maybe. But maybe the decoding of instructions that counts is not that expensive compared to OOO branch predictors and 512 bits ALUs
Most SIMD and FP instructions are not microcoded in a modern mainstream CPU, FWIW.
Certain Atom CPUs have no speculative execution nor OoO execution.
I don't think that's an advantage these days. The bottleneck seems to be decoding instructions, and that's easier to parallelize if instructions are fixed width. Case in point: The big cores on Apple's A11 and A12 SoCs can decode 7 instructions per cycle. Intel's Skylake can do 5. Intel CPUs also have μop caches because decoding x86 is so expensive.
Games console CPUs support those kind of instructions, don't they?To some extent, didn't Intel go down this road with VLIW: trying to shift the burden of making code fast onto the compiler, instead of the CPU?