Performance Numbers Worth Knowing
When you design software to achieve a particular level of performance, it can be a good idea to be familiar with the general speed regimes you are working with: fundamental limitations like storage devices and networks can drive software architecture. Here are a set of common benchmark numbers that can help you anchor performance conversations and think about the components that your software will interact with. As with all guidelines, these numbers are all slightly wrong, but still useful.
Throughputs
Some common byte-level throughputs are shown in the table below. All of the computing functions (eg compression, memcpy) are for one core of a modern server CPU.
Type | Component | Throughput | Time for 1 MB |
---|---|---|---|
Network | Average US Cable Internet Upload | 1.25 MB/s | 800 ms |
Network | Slow WiFi (802.11g) | 6.75 MB/s | 150 ms |
Algorithm | Tight Compression (gzip -9) | 10 MB/s | 100 ms |
Network | Average US Cable Internet Download | 12.5 MB/s | 80 ms |
Algorithm | Compression (gzip -1) | 64 MB/s | 16 ms |
Network | WiFi (802.11n) | 75 MB/s | 13 ms |
Network | Gigabit Ethernet | 125 MB/s | 8 ms |
Storage | Hard Drive | 150-200 MB/s | 5-7 ms |
Algorithm | Decompression (gzip) | 300 MB/s | 3.3 ms |
Network | Fast WiFi (802.11ax) | 440 MB/s | 2.3 ms |
Algorithm | Fast Compression (lz4) | 500 MB/s | 2 ms |
Algorithm | SHA-512 | 600 MB/s | 1.6 ms |
Storage | SATA 3.0 SSD | 750 MB/s | 1.3 ms |
Algorithm | SHA-1 | 900 MB/s | 1.1 ms |
I/O | PCIe gen 3 x1 (WiFi Card) | 1 GB/s | 1 ms |
Network | 10 Gigabit Ethernet | 1.25 GB/s | 800 µs |
Algorithm | AES-GCM Encryption | 2 GB/s | 500 µs |
I/O | PCIe gen 4 x1 (WiFi Card) | 2 GB/s | 500 µs |
Algorithm | JSON parsing with simdjson | 3 GB/s | 330 µs |
Algorithm | Fast Decompression (lz4) | 3 GB/s | 330 µs |
I/O | PCIe gen 3 x4 (NVMe SSD) | 4 GB/s | 250 µs |
I/O | PCIe gen 4 x4 (NVMe SSD) | 8 GB/s | 125 µs |
Memory | DDR4-3200 DRAM Channel Actual (x64) | ~12 GB/s | 83 µs |
Network | 100 Gigabit Ethernet | 12.5 GB/s | 80 µs |
I/O | PCIe gen 3 x16 (GPU or accelerator) | 16 GB/s | 63 µs |
Memory | DDR5-4800 DRAM Channel Acutal (x64) | ~20 GB/s | 50 µs |
Algorithm | CRC32C Checksum | 25 GB/s | 40 µs |
Memory | DDR4-3200 DRAM Channel Theoretical Max (x64) | 25.6 GB/s | 40 µs |
I/O | PCIe gen 4 x16 (GPU or accelerator) | 32 GB/s | 32 µs |
Memory | DDR5-4800 DRAM Channel Theoretical Max (x64) | 38.4 GB/s | 26 µs |
Algorithm | memcpy | 50 GB/s | 20 µs |
Latencies
Some common latencies are shown in the table below. Most of these are fundamental, but several are the product of the design of protocols and systems. All of these are shown assuming that they are uncongested (so there is no queueing delay), and network delays are shown as 1/2 of round-trip time, representing one-way latency.
Type | Component/Process | Latency |
---|---|---|
CPU | CPU Instruction (1 cycle) | 400 ps |
CPU | L1 Cache Access | 1.2 ns |
CPU | Branch Misprediction | 4 ns |
CPU | L2 Cache Access | 4 ns |
CPU | Atomic Instruction | ~10 ns |
CPU | L3 Cache Access | 15-20 ns |
CPU | DRAM Access (Cache Miss) | 50-100 ns |
LAN | Cut-through Switch | 100 ns |
Device | Low-latency Network Card | 500 ns |
Device | PCIe Accelerator/GPU Access | 1 µs |
Serializaton | 1.5 kB Store-and-forward Delay on 10 Gigabit Ethernet | 1.2 µs |
Device | Datacenter Network Card (SFP, QSFP, etc.) | 1.5 µs |
LAN | Datacenter Switch | 2 µs |
LAN | Medium-sized Datacenter Network | 10 µs |
Device | Gigabit Ethernet Network Card (1GBase-T) | 10 µs |
Serializaton | 1.5 kB Store-and-forward Delay on Gigabit Ethernet | 12 µs |
LAN | Copper Gigabit Ethernet Switch | 20 µs |
Cloud | Intra-zone Cloud Network | 20 µs |
Storage | SSD Read | 50-100 µs |
Cloud | Inter-zone Cloud Network | 500 µs |
Edge Network | Low-interference WiFi Connection | 1 ms |
Serializaton | 1.5 kB Store-and-forward Delay on 10 Megabit Ethernet | 1.2 ms |
Cloud | Inter-region Cloud Network | 2.5 ms |
Storage | 7200 RPM Hard Drive Rotation | 4.2 ms |
Edge Network | High-interference WiFi Connection | 5 ms |
Edge Network | DOCSIS 3.0 Cable Modem | 5 ms |
Storage | Hard Drive Seek | 10 ms |
WAN | US East Coast to West Coast | 20 ms |
WAN | US East Coast to UK | 30 ms |
WAN | US West Coast to Chile | 50 ms |
WAN | UK to India | 70 ms |
WAN | US West Coast to Hong Kong | 100 ms |
WAN | US to India | 150 ms |
WAN | UK to Hong Kong | 150 ms |