Racing the Hardware: 8-bit Division
Occasionally, I like to peruse uops.info. It is a great resource for micro-optimization:
benchmark every x86 instruction on every architecture, and compile the results. Every time I look at this table,
there is one thing that sticks out to me: the DIV
instruction. On a Coffee Lake CPU, an 8-bit DIV
takes
a long time: 25 cycles. Cannon Lake and Ice Lake do a lot better, and so does AMD. We know that divider
architecture is different between architectures, and aggregating all of the performance numbers for an
8-bit DIV
, we see: