As you build a computer system, little things start to show up: maybe that database query is awkward for the feature you are building, or you find your server getting bogged down transferring gigabytes of data in hexadecimal ASCII, or your app translates itself to Japanese on the fly for hundreds of thousands of separate users. These are places where your abstractions are misaligned - your app would be quantitatively better if it had a better DB schema, a way to transfer binary data, or native internationalization for your Japanese users.
I have recently been immersed in the theory and practice of random number generation while working on Arbitrand, a new high-quality true random number generation service hosted in AWS. Because of that, I am starting a sequence of blog posts on randomness and random number generators. This post is the first of the sequence, and focuses what random number generators are and how to test them. Formally, random number generators are systems that produce a stream of bits (or numbers) with two properties:
A modern CPU is an incredible machine. It can execute many instructions at the same time, it can re-order instructions to ensure that memory accesses and dependency chains don’t impact performance too much, it contains hundreds of registers, and it has huge areas of silicon devoted to predicting which branches your code will take. However, if you have a tight loop and you are interested in optimizing the hell out of it, the same mechanisms that make your code run fast can make your job very difficult.
Intel’s Optane memory modules launched with a lot of fanfare in 2015, and were recently discontinued, in 2022, with similar fanfare. It was a sad day for me, a lover of abstraction-breaking technologies, but it was forseeable and understandable. At the time of Optane’s launch, a lot of us were excited about the idea of having a new storage tier, sitting between DRAM and flash. It was announced as having DRAM endurance and speed with the persistence and size of flash.
A lot of ink is spent on the “monoliths vs. microservices” debate, but the real issue behind this debate is about whether distributed system architecture is worth the developer time and cost overheads. By thinking about the real operational considerations of our systems, we can get some insight into whether we actually need distributed systems for most things. We have all gotten so familiar with virtualization and abstractions between our software and the servers that run it.
In performance work, you will often find many distributions that are weirdly shaped: fat-tailed distributions, distributions with a hard lower bound at a non-zero number, and distributions that are just plain odd. Particularly when you look at latency distributions, it is extremely common for the 99th percentile to be a lot further from the mean than the 1st percentile. These sorts of asymmetric fat-tailed distributions come with the business. Often times, when performance engineers need to be scientific about their work, they will take samples of these distributions, and put them into into a $t$-test to get a $p$-value for the significance of their improvements.
In 2018, I took the jump from being primarily an FPGA hardware engineer to being primarily a software engineer. At the time, things were looking great for FPGA acceleration, with AWS and later Azure bringing in VMs with FPGAs and the two big FPGA vendors setting their sights on application acceleration. Almost 5 years later, I am working on another project with FPGAs, this time a cloud-oriented one. That has inspired me to write a retrospective on the last 5 years of what we thought would be an FPGA acceleration boom.
A post recently made the rounds on hacker news claiming that you should teach your kids poker, not chess. The comments on that post go through a lot of the reasons why poker is a bad game to teach your children, but I felt that I was well suited to opine on this topic, and explain why duplicate bridge is the best game for practicing the life skills involved in business and programming, compared to all of the alternatives.
When we think of how to represent fractional numbers in code, we reach for double and float, and almost never reach for anything else. There are several alternatives, including constructive real numbers that are used in calculators, and rational numbers. One alternative predates all of these, including floating point, and actually allows you to compute faster than when you use floating point numbers. That alternative is fixed point: a primitive form of decimal that does not offer any of the conveniences of float, but allows you to do decimal computations more quickly and efficiently.
I have been working on another post recently, also related to division, but I wanted to address a comment I got from several people on the previous division article. This comment invariably follows a lot of articles on using math to do things with chars and shorts. It is: “why are you doing all of this when you can just use a lookup table?” Even worse, a stubborn and clever commenter may show you a benchmark where your carefully-crafted algorithm performs worse than their hamfisted lookup table.