Performance

You (Probably) Shouldn't use a Lookup Table

I have been working on another post recently, also related to division, but I wanted to address a comment I got from several people on the previous division article. This comment invariably follows a lot of articles on using math to do things with chars and shorts. It is: “why are you doing all of this when you can just use a lookup table?” Even worse, a stubborn and clever commenter may show you a benchmark where your carefully-crafted algorithm performs worse than their hamfisted lookup table.

Racing the Hardware: 8-bit Division

Occasionally, I like to peruse uops.info. It is a great resource for micro-optimization: benchmark every x86 instruction on every architecture, and compile the results. Every time I look at this table, there is one thing that sticks out to me: the DIV instruction. On a Coffee Lake CPU, an 8-bit DIV takes a long time: 25 cycles. Cannon Lake and Ice Lake do a lot better, and so does AMD. We know that divider architecture is different between architectures, and aggregating all of the performance numbers for an 8-bit DIV, we see:

The Meaning of Speed

A lot of the time, when engineers think of performance work, we think about looking at benchmarks and making the numbers smaller. We anticipate that we are benchmarking the right pieces of code, and we take it for granted that reducing some of those numbers is a benefit, but also “the root of all evil” if done prematurely. If you are a performance-focused software engineer, or you are working with performance engineers, it can help to understand the value proposition of performance and when to work on it.

Performance Numbers Worth Knowing

When you design software to achieve a particular level of performance, it can be a good idea to be familiar with the general speed regimes you are working with: fundamental limitations like storage devices and networks can drive software architecture. Here are a set of common benchmark numbers that can help you anchor performance conversations and think about the components that your software will interact with. As with all guidelines, these numbers are all slightly wrong, but still useful.