Python and Assembly have one thing in common: as a professional software engineer, they are both languages that you probably should know how to read, but be terrified to write. These languages seem to be (and are) at opposite ends of the spectrum: One is almost machine code, and the other is almost a scripting language. One is beginner-friendly and the other is seen as hostile to experts. One is viciously versatile with tons of libraries and ports, and the other is ridiculously limited in its capabilities.

Occasionally, I like to peruse uops.info. It is a great resource for micro-optimization: benchmark every x86 instruction on every architecture, and compile the results. Every time I look at this table, there is one thing that sticks out to me: the DIV instruction. On a Coffee Lake CPU, an 8-bit DIV takes a long time: 25 cycles. Cannon Lake and Ice Lake do a lot better, and so does AMD. We know that divider architecture is different between architectures, and aggregating all of the performance numbers for an 8-bit DIV, we see:

A lot of the time, when engineers think of performance work, we think about looking at benchmarks and making the numbers smaller. We anticipate that we are benchmarking the right pieces of code, and we take it for granted that reducing some of those numbers is a benefit, but also “the root of all evil” if done prematurely. If you are a performance-focused software engineer, or you are working with performance engineers, it can help to understand the value proposition of performance and when to work on it.

When you design software to achieve a particular level of performance, it can be a good idea to be familiar with the general speed regimes you are working with: fundamental limitations like storage devices and networks can drive software architecture. Here are a set of common benchmark numbers that can help you anchor performance conversations and think about the components that your software will interact with. As with all guidelines, these numbers are all slightly wrong, but still useful.

This is the second part in a 2-part series on the “Fibonacci” interview problem. We are building off of a previous post, so take a look at Part I if you haven’t seen it.
Previously, we examined the problem and constructed a logarithmic-time solution based on computing the power of a matrix. Now we will derive a constant time solution using some more linear algebra. If you had trouble with the linear algebra in part I, it may help to read up on matrices, matrix multiplicaiton, and special matrix operations (specifically determinants and inverses) before moving on.

Few interview problems are as notorious as the “Fibonacci” interview question. At first glance, it seems good: Most people know something about the problem, and there are several clever ways to achieve a linear time solution. Usually, in interviews, the linear time solution is the expected solution. However, the Fibonacci problem is unique among interview problems in that the expected solution is not the optimal solution. There is an $O(1)$ solution, and to get there, we need a little bit of linear algebra.

Hello everyone, and welcome to my blog.
I am an ex-Google senior engineer focused on systems programming and performance optimization. My past experience includes hardware engineering, numerical analysis, and high-frequency trading.
Here, we will be talking about software engineering, performance, computer systems and foundations, interesting math concepts, electrical engineering, hardware acceleration, FPGAs, and more. We may also branch out to non-technical topics including companies, organizational psychology, and the stock market.