Performance Numbers Worth Knowing

When you design software to achieve a particular level of performance, it can be a good idea to be familiar with the general speed regimes you are working with: fundamental limitations like storage devices and networks can drive software architecture. Here are a set of common benchmark numbers that can help you anchor performance conversations and think about the components that your software will interact with. As with all guidelines, these numbers are all slightly wrong, but still useful.

Throughputs

Some common byte-level throughputs are shown in the table below. All of the computing functions (eg compression, memcpy) are for one core of a modern server CPU.

Type Component Throughput Time for 1 MB
Network Average US Cable Internet Upload 1.25 MB/s 800 ms
Network Slow WiFi (802.11g) 6.75 MB/s 150 ms
Algorithm Tight Compression (gzip -9) 10 MB/s 100 ms
Network Average US Cable Internet Download 12.5 MB/s 80 ms
Algorithm Compression (gzip -1) 64 MB/s 16 ms
Network WiFi (802.11n) 75 MB/s 13 ms
Network Gigabit Ethernet 125 MB/s 8 ms
Storage Hard Drive 150-200 MB/s 5-7 ms
Algorithm Decompression (gzip) 300 MB/s 3.3 ms
Network Fast WiFi (802.11ax) 440 MB/s 2.3 ms
Algorithm Fast Compression (lz4) 500 MB/s 2 ms
Algorithm SHA-512 600 MB/s 1.6 ms
Storage SATA 3.0 SSD 750 MB/s 1.3 ms
Algorithm SHA-1 900 MB/s 1.1 ms
I/O PCIe gen 3 x1 (WiFi Card) 1 GB/s 1 ms
Network 10 Gigabit Ethernet 1.25 GB/s 800 µs
Algorithm AES-GCM Encryption 2 GB/s 500 µs
I/O PCIe gen 4 x1 (WiFi Card) 2 GB/s 500 µs
Algorithm JSON parsing with simdjson 3 GB/s 330 µs
Algorithm Fast Decompression (lz4) 3 GB/s 330 µs
I/O PCIe gen 3 x4 (NVMe SSD) 4 GB/s 250 µs
I/O PCIe gen 4 x4 (NVMe SSD) 8 GB/s 125 µs
Memory DDR4-3200 DRAM Channel Actual (x64) ~12 GB/s 83 µs
Network 100 Gigabit Ethernet 12.5 GB/s 80 µs
I/O PCIe gen 3 x16 (GPU or accelerator) 16 GB/s 63 µs
Memory DDR5-4800 DRAM Channel Acutal (x64) ~20 GB/s 50 µs
Algorithm CRC32C Checksum 25 GB/s 40 µs
Memory DDR4-3200 DRAM Channel Theoretical Max (x64) 25.6 GB/s 40 µs
I/O PCIe gen 4 x16 (GPU or accelerator) 32 GB/s 32 µs
Memory DDR5-4800 DRAM Channel Theoretical Max (x64) 38.4 GB/s 26 µs
Algorithm memcpy 50 GB/s 20 µs

Latencies

Some common latencies are shown in the table below. Most of these are fundamental, but several are the product of the design of protocols and systems. All of these are shown assuming that they are uncongested (so there is no queueing delay), and network delays are shown as 1/2 of round-trip time, representing one-way latency.

Type Component/Process Latency
CPU CPU Instruction (1 cycle) 400 ps
CPU L1 Cache Access 1.2 ns
CPU Branch Misprediction 4 ns
CPU L2 Cache Access 4 ns
CPU Atomic Instruction ~10 ns
CPU L3 Cache Access 15-20 ns
CPU DRAM Access (Cache Miss) 50-100 ns
LAN Cut-through Switch 100 ns
Device Low-latency Network Card 500 ns
Device PCIe Accelerator/GPU Access 1 µs
Serializaton 1.5 kB Store-and-forward Delay on 10 Gigabit Ethernet 1.2 µs
Device Datacenter Network Card (SFP, QSFP, etc.) 1.5 µs
LAN Datacenter Switch 2 µs
LAN Medium-sized Datacenter Network 10 µs
Device Gigabit Ethernet Network Card (1GBase-T) 10 µs
Serializaton 1.5 kB Store-and-forward Delay on Gigabit Ethernet 12 µs
LAN Copper Gigabit Ethernet Switch 20 µs
Cloud Intra-zone Cloud Network 20 µs
Storage SSD Read 50-100 µs
Cloud Inter-zone Cloud Network 500 µs
Edge Network Low-interference WiFi Connection 1 ms
Serializaton 1.5 kB Store-and-forward Delay on 10 Megabit Ethernet 1.2 ms
Cloud Inter-region Cloud Network 2.5 ms
Storage 7200 RPM Hard Drive Rotation 4.2 ms
Edge Network High-interference WiFi Connection 5 ms
Edge Network DOCSIS 3.0 Cable Modem 5 ms
Storage Hard Drive Seek 10 ms
WAN US East Coast to West Coast 20 ms
WAN US East Coast to UK 30 ms
WAN US West Coast to Chile 50 ms
WAN UK to India 70 ms
WAN US West Coast to Hong Kong 100 ms
WAN US to India 150 ms
WAN UK to Hong Kong 150 ms

Subscribe for Email Notifications