Introduction to Micro-Optimization
A modern CPU is an incredible machine. It can execute many instructions at the same time, it can re-order instructions to ensure that memory accesses and dependency chains don’t impact performance too much, it contains hundreds of registers, and it has huge areas of silicon devoted to predicting which branches your code will take. However, if you have a tight loop and you are interested in optimizing the hell out of it, the same mechanisms that make your code run fast can make your job very difficult.