Mathematical Algorithms

A Cryptographically Secret Santa

Twas about 4-6 weeks before Christmas, and all through the math department, not a creature was stirring, not even a plucky young undergrad. Cryptography professors Alice and Bob sat at the elliptically-curved conference table to plan the department’s secret Santa. Mallory, the department secretary, had been given the task of organizing last year, and somehow managed to get three gifts while leaving several people disappointed. This year’s math department thus resolved to do their secret Santa without a trusted party.

The Computer Architecture of AI (in 2024)

Over the last year, as a person with a hardware background, I have heard a lot of complaints about Nvidia’s dominance of the machine learning market and whether I can build chips to make the situation better. While the amount of money I would expect it to take is less than $7 trillion, hardware accelerating this wave of AI will be a very tough problem–much tougher than the last wave focused on CNNs–and there is a good reason that Nvidia has become the leader in this field with few competitors. While the inference of CNNs used to be a math problem, the inference of large language models has actually become a computer architecture problem involving figuring out how to coordinate memory and I/O with compute to get the best performance out of the system.

The Most Useful Statistical Test You Didn't Learn in School

In performance work, you will often find many distributions that are weirdly shaped: fat-tailed distributions, distributions with a hard lower bound at a non-zero number, and distributions that are just plain odd. Particularly when you look at latency distributions, it is extremely common for the 99th percentile to be a lot further from the mean than the 1st percentile. These sorts of asymmetric fat-tailed distributions come with the business.

Often times, when performance engineers need to be scientific about their work, they will take samples of these distributions, and put them into into a $t$-test to get a $p$-value for the significance of their improvements. That is what you learned in a basic statistics or lab science class, so why not? Unfortunately, the world of computers is more complicated than the beer quality experiments for which the $t$-test was invented, and violates one of its core assumptions: that the sample means are normally distributed. When you have a lot of samples, this can hold, but it often doesn’t.

Fixed Point Arithmetic

When we think of how to represent fractional numbers in code, we reach for double and float, and almost never reach for anything else. There are several alternatives, including constructive real numbers that are used in calculators, and rational numbers. One alternative predates all of these, including floating point, and actually allows you to compute faster than when you use floating point numbers. That alternative is fixed point: a primitive form of decimal that does not offer any of the conveniences of float, but allows you to do decimal computations more quickly and efficiently. Fixed point still has usage in some situations today, and it can be a potent tool in your arsenal as a programmer if you find yourself working with math at high speed.

Racing the Hardware: 8-bit Division

Occasionally, I like to peruse uops.info. It is a great resource for micro-optimization: benchmark every x86 instruction on every architecture, and compile the results. Every time I look at this table, there is one thing that sticks out to me: the DIV instruction. On a Coffee Lake CPU, an 8-bit DIV takes a long time: 25 cycles. Cannon Lake and Ice Lake do a lot better, and so does AMD. We know that divider architecture is different between architectures, and aggregating all of the performance numbers for an 8-bit DIV, we see:

Constant-time Fibonacci

This is the second part in a 2-part series on the “Fibonacci” interview problem. We are building off of a previous post, so take a look at Part I if you haven’t seen it.

Previously, we examined the problem and constructed a logarithmic-time solution based on computing the power of a matrix. Now we will derive a constant time solution using some more linear algebra. If you had trouble with the linear algebra in part I, it may help to read up on matrices, matrix multiplicaiton, and special matrix operations (specifically determinants and inverses) before moving on.

Less-than-linear Fibonacci

Few interview problems are as notorious as the “Fibonacci” interview question. At first glance, it seems good: Most people know something about the problem, and there are several clever ways to achieve a linear time solution. Usually, in interviews, the linear time solution is the expected solution. However, the Fibonacci problem is unique among interview problems in that the expected solution is not the optimal solution. There is an $O(1)$ solution, and to get there, we need a little bit of linear algebra.