The Most Useful Statistical Test You Didn't Learn in School

In performance work, you will often find many distributions that are weirdly shaped: fat-tailed distributions, distributions with a hard lower bound at a non-zero number, and distributions that are just plain odd. Particularly when you look at latency distributions, it is extremely common for the 99th percentile to be a lot further from the mean than the 1st percentile. These sorts of asymmetric fat-tailed distributions come with the business.

Often times, when performance engineers need to be scientific about their work, they will take samples of these distributions, and put them into into a $t$-test to get a $p$-value for the significance of their improvements. That is what you learned in a basic statistics or lab science class, so why not? Unfortunately, the world of computers is more complicated than the beer quality experiments for which the $t$-test was invented, and violates one of its core assumptions: that the sample means are normally distributed. When you have a lot of samples, this can hold, but it often doesn’t.

What Happened with FPGA Acceleration?

In 2018, I took the jump from being primarily an FPGA hardware engineer to being primarily a software engineer. At the time, things were looking great for FPGA acceleration, with AWS and later Azure bringing in VMs with FPGAs and the two big FPGA vendors setting their sights on application acceleration. Almost 5 years later, I am working on another project with FPGAs, this time a cloud-oriented one. That has inspired me to write a retrospective on the last 5 years of what we thought would be an FPGA acceleration boom.

Teach Your Kids Bridge

A post recently made the rounds on hacker news claiming that you should teach your kids poker, not chess. The comments on that post go through a lot of the reasons why poker is a bad game to teach your children, but I felt that I was well suited to opine on this topic, and explain why duplicate bridge is the best game for practicing the life skills involved in business and programming, compared to all of the alternatives.

Fixed Point Arithmetic

When we think of how to represent fractional numbers in code, we reach for double and float, and almost never reach for anything else. There are several alternatives, including constructive real numbers that are used in calculators, and rational numbers. One alternative predates all of these, including floating point, and actually allows you to compute faster than when you use floating point numbers. That alternative is fixed point: a primitive form of decimal that does not offer any of the conveniences of float, but allows you to do decimal computations more quickly and efficiently. Fixed point still has usage in some situations today, and it can be a potent tool in your arsenal as a programmer if you find yourself working with math at high speed.

You (Probably) Shouldn't use a Lookup Table

I have been working on another post recently, also related to division, but I wanted to address a comment I got from several people on the previous division article. This comment invariably follows a lot of articles on using math to do things with chars and shorts. It is: “why are you doing all of this when you can just use a lookup table?”

Even worse, a stubborn and clever commenter may show you a benchmark where your carefully-crafted algorithm performs worse than their hamfisted lookup table. Surely you have made a mistake and you should just use a lookup table. Just look at the benchmark!

Who Controls a DAO?

In honor of April Fools’ Day, I decided to write about a blockchain topic. The crypto economy is in the process of speedrunning their way from zero to a modern economy, and when you move that fast, a few things have to break along the way. One of those things is corporate governance.

Matt Levine’s “Money Stuff” is a financial newsletter that I can’t recommend enough. If you are at all interested in finance, stocks, and markets, it is funny and informative read. One of the recurring topics of Money Stuff is “who controls a company?” Quoting a bit of the newsletter:

Python is Like Assembly

Python and Assembly have one thing in common: as a professional software engineer, they are both languages that you probably should know how to read, but be terrified to write. These languages seem to be (and are) at opposite ends of the spectrum: One is almost machine code, and the other is almost a scripting language. One is beginner-friendly and the other is seen as hostile to experts. One is viciously versatile with tons of libraries and ports, and the other is ridiculously limited in its capabilities. However, when you are creating production software, both are the wrong tool for the job.

Racing the Hardware: 8-bit Division

Occasionally, I like to peruse uops.info. It is a great resource for micro-optimization: benchmark every x86 instruction on every architecture, and compile the results. Every time I look at this table, there is one thing that sticks out to me: the DIV instruction. On a Coffee Lake CPU, an 8-bit DIV takes a long time: 25 cycles. Cannon Lake and Ice Lake do a lot better, and so does AMD. We know that divider architecture is different between architectures, and aggregating all of the performance numbers for an 8-bit DIV, we see:

The Meaning of Speed

A lot of the time, when engineers think of performance work, we think about looking at benchmarks and making the numbers smaller. We anticipate that we are benchmarking the right pieces of code, and we take it for granted that reducing some of those numbers is a benefit, but also “the root of all evil” if done prematurely. If you are a performance-focused software engineer, or you are working with performance engineers, it can help to understand the value proposition of performance and when to work on it.

Performance Numbers Worth Knowing

When you design software to achieve a particular level of performance, it can be a good idea to be familiar with the general speed regimes you are working with: fundamental limitations like storage devices and networks can drive software architecture. Here are a set of common benchmark numbers that can help you anchor performance conversations and think about the components that your software will interact with. As with all guidelines, these numbers are all slightly wrong, but still useful.