Perfect Random Floating-Point Numbers

May 3, 2025 · 16 min read · Mathematical Algorithms ·

When I recently looked at the state of the art in floating point random number generation, I was surprised to see a common procedure in many programming languages and libraries that is not really a floating-point algorithm:

Generate a random integer with bits chosen based on the precision of the format.
Convert to floating point.
Divide to produce an output between 0 and 1.

In code, this looks like:

1func (r *Rand) Float64() float64 {
2    int64 rand_int = r.Int63n(1<<53)
3    return float64(rand_int) / (1<<53)
4}

This function is supposed to produce floating-point numbers drawn from a uniform distribution in the interval $[0, 1)$. Zero is a possible output, but one is not, and the distribution is uniform. The number "53" in the algorithm above is chosen in a way that is floating-point aware: the double-precision floating-point numbers have 53 bits of precision, so this algorithm only creates bits equal to the precision of the number system. It seems to fit the bill.

However, this algorithm has a few basic flaws. First, it cannot access most of the floating-point numbers between 0 and 1. There are $2^{62}$ float64 numbers in this range, and this algorithm can only access $2^{53}$ of them: $\frac{1}{512}$ of the space. Even if we had used a larger integer range, like $2^{64}$, many random integers would alias to the same floating-point result and some integers would round up to 1 during division. Second, the least significant bits that come from this generator are biased. This is really a fixed-point random number generator followed by a conversion to floating point.

There have been some past attempts to fix these flaws, but none that avoid a huge performance penalty while doing so. I recently finished a paper on a new method for generating perfect random floating point numbers, and I have put a preprint on Github with a reference implementation in go. Thanks to branch predictors, the perfect algorithm is not much slower than the fixed-point algorithm above.

The Floating-Point Numbers Between 0 and 1

A floating-point number is a tuple of three components, packed into a 32- or 64-bit word:

Field	Meaning	Bits (`float32`)	Bits (`float64`)
Sign ($s$)	$+$ or $-$	1	1
Exponent ($e$)	Order of magnitude of the leading bit	8	11
Manitssa ($m$)	All significant bits after the leading bit	23	52

The storage of each number is efficient: the leading bit of the number is not stored since it can be implied from the exponent as either being a zero (when $e = 0$, allowing $0$ to be represented) or a one bit (all other cases). Floating point also has infinities and non-number values (NaN means "not a number"), which are encoded with an exponent of all ones. Otherwise, the exponent is an unsigned number that is stored so that subtraction of a bias can create a negative exponent. A normal floating-point number has its value given by:

$$ F = (-1)^s 2^{e - eBias} (1.m) $$

Typically, $eBias$ is half of the representable range of the exponent field. The 32-bit floats have an 8-bit exponent, and use $eBias = 127$, so the range of possible exponents is $-126$ to $127$ ($e = 0$ and $e = 255$ have special meanings).

Half of all floating-point numbers lie between -1 and 1: every number with an exponent from $0$ up to $eBias$ is less than 1. This seems unintuitive, but it is an artifact of how the floating-point numbers work: they have equal precision across the range of magnitudes. The number $0.000001$ and the number $10^{20}$ both have 53 bits of precision (or 24 if we were using 32-bit floats). This is akin to the concept of "significant digits": every floating-point number has 53 significant bits regardless of magnitude. The floating-point numbers are power-law distributed, and many more of them have extreme magnitudes than it seems.

Floating-Point Rounding

When an operation produces too many bits, we need to round them to produce a floating-point result. There are four floating-point rounding modes, but three are relatively specialized:

Round-to-nearest-ties-to-even: Rounds to the nearest floating-point number, with ties going to the even number. This is close to "rounding" as we learned in school except schoolbook rounding involves ties going away from zero. Ties to even is better for numerical stability.
Round toward zero (truncation): Always round to the floating-point number closer to zero. Equivalent to truncating trailing bits.
Round down: Always round to the lower number.
Round up: Always round to the higher number.

Nearest-ties-to-even is the default for arithmetic because it is the most accurate. When we consider random number generation that is floating-point native, we may prefer to follow the chosen rounding mode rather than using the strict idea of operating in $[0, 1)$. Number lines showing some examples of rounding in a hypothetical floating-point format are below, comparing the round down and round to nearest modes.

When rounding to nearest, the regions shown in this number line round to 0.5, 2, and 5 in a hypothetical floating-point format with $p = 2$:

Rounding regions center on the result, but at boundaries between exponents, the rounding region is asymmetric. In the same hypothetical format, compare these regions that round to the same numbers:

In this case, the region that rounds to each number is the region contained between two numbers.

Integer Arithmetic on Floating-Point Numbers

Floating-point numbers are packed into machine words that can be manipulated with bitwise logical operations and integer arithmetic. This lets us do a few slightly dangerous things:

We can construct floating-point numbers by OR-ing integers baring multiple parts of the number together (eg attaching a mantissa to an exponent).
Adding small integers to floating-point numbers is a perterbation by a few units in the last place (ULPs), although this gets weird when you cross exponent boundaries.
Adding and subtracting large integers that reach the exponent field is a fast way to multiply and divide floating-point numbers, provided you handle the boundary conditions.

These are tricks that allow you to build your own fast operations. Crazier tricks, like the Quake III fast inverse square root essentially rely on these basic building blocks.

Uniform Randomness in Floating Point

With this in mind, we can construct a notion for perfect floating-point random number generation:

For unbiased random floating-point numbers, generate floating-point numbers with probabilities given by drawing a real number and then rounding to floating point.

Integer and fixed-point random number generation both follow this rule, but they have the advantage of splitting space into uniform pieces, so each number is equally probable. Floating point does not have this advantage. Instead, the probability depends on the distance between floating-point numbers.

In general, the numbers between 0.5 and 1 are each twice as likely as the numbers between 0.25 and 0.5, which are twice as likely as the numbers between 0.125 and 0.25, and so on. The edges of each exponent range have probability dicated by rounding modes: When rounding down (or toward zero), the bottom of the range is the same probability as numbers in the range. When rounding up, the top of the range shares a probability with the number range. When rounding to nearest, we split the difference. A probability table for the first two exponents is below for uniform generation in $(0, 1)$, using $p$ as the precision in bits ($p = 53$ for 64-bit floating point, and $p = 24$ for 32-bit).

Output Range	Rounding Mode	Lowest Number	Interior Numbers	Highest Number
$[0.5, 1]$	Down	$2^{-p}$	$2^{-p}$	$0$
$[0.5, 1]$	Up	$2^{-p - 1}$	$2^{-p}$	$2^{-p}$
$[0.5, 1]$	Nearest	$2^{-p - 1} + 2^{-p - 2}$	$2^{-p}$	$2^{-p - 1}$
$[0.25, 0.5]$	Down	$2^{-p - 1}$	$2^{-p - 1}$	$2^{-p}$
$[0.25, 0.5]$	Up	$2^{-p - 2}$	$2^{-p - 1}$	$2^{-p - 1}$
$[0.25, 0.5]$	Nearest	$2^{-p - 2} + 2^{-p - 3}$	$2^{-p - 1}$	$2^{-p - 2} + 2^{-p - 1}$

The table continues down toward an exponent of zero. For all three modes, interior numbers in these ranges share a probability: each of these has equal mass of real numbers above and below, so nothing changes if we round up, round down, or split the difference. The boundaries of each range do change, however. When rounding down, the powers of two are the same probability as the numbers above, and when rounding up, the powers of two are the same probability as the numbers below them. When rounding to nearest, we average the probability weight of the numbers above and below, meaning that they are $1.5$ times as likely as a number in the lower range. The first three rows of the table show a boundary condition, since there is no chance of rounding down toward 1 when we are drawing our real numbers.

The typical form of random float generation maps best onto the "round down" mode, which cannot generate 1. The other rounding modes give some probability mass to 1. Another interesting fact we can observe is that the least significant bits of these numbers should be uniformly distributed, even though the magnitude of the least significant bit varies.

Since we are treating this as a real number range, the openness or closedness of the range doesn't actually matter. Probability masses are the same when you round real numbers in $(0, 1)$ to floating point as when you round numbers in $[0, 1]$.

An Algorithm for Floating-Point Randomness

We can match these probabilities by using an algorithm that works in two steps:

Generate a fixed-point random number with granularity of $p$, with some special handling if we generate 0, to find the range of real numbers from which we draw the number.
Finish drawing a number by backfilling the extra bits of precision that are available.

The fixed-point phase finds a range of floating-point numbers that we will draw from, and the finalization phase picks from that range. In the fixed-point phase, we note that ony the lowest number range, from $0$ to $2^{-p}$, contains multiple exponents. Thus, to handle this range, we simply work recursively: we run the algorithm as before, but with a scale factor of $2^{-p}$. Once we reach the subnormal numbers, we hit the base case where all possible values from the fixed-point phase map to a specific exponent. This allows us to identify the exponent of the output by doing repeated fixed-point draws.

The fixed-point result here will have a known number of trailing zeros based on the order of magnitude of the result. The finalization phase fills these bits.

In the finalization phase, we add an integer with a number of bits given by the distance between the exponent of $1$ and the exponent of the fixed-point number we generated. This distance is the number of bits of precision we add when we convert the fixed-point number to floating-point, so we just backfill them. This phase varies slightly by rounding mode.

The whole algorithm is given below (for float64 in Go). We use a slightly different form of fixed-point random number generator: instead of generating a large integer and dividing, we generate a random mantissa, place it between 1 and 2 (all of which share an exponent), and then subtract 1. This has exactly the same output distribution as the method of division and may be slower on some architectures.

 1// Generate float64 with a given rounding mode
 2func (r *FPRand) Float64Rounding(mode RoundingMode) float64 {
 3    const PRECISION = 53;
 4    const FP64_MANTISSA_SIZE = PRECISION - 1;
 5    const FP64_MANTISSA_MASK = (1 << FP64_MANTISSA_SIZE) - 1
 6
 7    // PHASE 1: Zooming fixed-point phase
 8
 9    // Variables for zooming down the exponent range
10    exp_range := uint64(FP64_MANTISSA_SIZE) << FP64_MANTISSA_SIZE
11    var one = math.Float64bits(1)
12
13    // Track the number of tail bits if we get to underflow
14    var tail_bits uint64 = 0
15
16    // Generate mantissas until we get nonzero, expected to run only once
17    var mantissa = r.src.Int63n(1 << FP64_MANTISSA_SIZE)
18    for mantissa == 0 {
19        // Zoom down so each step is between mantissa = 0 and 1 from the previous step
20        one -= exp_range
21
22        if one < exp_range {
23            // Low-probability underflow base case after 19 rounds
24            // Exponents go: 1023, 971, 867, ..., 35
25            const UF_TAIL_SIZE = 35
26            const UF_SHIFT = FP64_MANTISSA_SIZE - UF_TAIL_SIZE
27
28            mantissa = r.src.Int63n(1 << UF_TAIL_SIZE) << UF_SHIFT
29            tail_bits = UF_SHIFT
30            break
31        }
32
33        mantissa = r.src.Int63n(1 << FP64_MANTISSA_SIZE)
34    }
35
36    // Generate a number between 1 and 2 (scaled based on loops) and subtract 1
37    // This is the fixed-point random result
38    num := math.Float64frombits(one | mantissa) - math.Float64frombits(one)
39    num_as_int := math.Float64bits(num)
40
41    // PHASE 2: Finalization
42
43    // Find out how many tail bits we need to fill is by subtracting exponents
44    tail_bits += (one >> FP64_MANTISSA_SIZE) - (num_as_int >> FP64_MANTISSA_SIZE)
45
46    // Handle exponent of 0 - this code path is unlikely to be hit before the universe ends
47    if tail_bits > FP64_MANTISSA_SIZE {
48        tail_bits = FP64_MANTISSA_SIZE
49    }
50
51    // Generate the tail based on rounding mode
52    var tail uint64 = 0
53    if mode == RoundDown {
54        tail = r.src.Int63n(1 << tail_bits)
55    } else if mode == RoundUp {
56        tail = r.src.Int63n(1 << tail_bits) + 1
57    } else {
58        tail = (r.src.Int63n(1 << tail_bits) + 1) >> 1
59    }
60
61    return math.Float64frombits(num_as_int + tail)
62}
63
64// Functional equivalent to Float64 - generates numbers in [0, 1)
65func (r *FPRand) Float64() float64 {
66    return Float64Rounding(RoundDown)
67}

The code really starts at line 10, where we set a constant that we will use to subtract (in integer space) from "one" (line 11) if we have a mantissa of zero. This is just a floating point division by a power of 2. We track tail bits at line 14, although this will be 0 unless we hit the bottom of the exponent range.

Lines 17 and 18 start the action: we generate a mantissa and loop while that mantissa is 0. This is very unlikely with 64-bit floating point! At each loop iteration, we reduce our "one" variable so that it is the size of one tick mark on the previous iteration: the numbers we generate the next time are smaller than what a mantissa of 1 would have given before. The next mantissa is generated at line 32, and we also handle the base case here at line 22: we only generate a partial mantissa in this case because this keeps the subtraction exact when we are in such a ludicrously low exponent range.

Lines 37 and 38 complete fixed-point random number generation: we construct a floating-point number as the OR of the exponent and sign of the one variable (which usually is equal to 1) and the generated mantissa. This gives a result between 1 and 2, scaled based on the number of times we drew a mantissa of 0. The subtraction on line 37 is thus always exact.

We can calculate how many trailing zeros come out of that subtraction based on the delta between the exponents of the input and the output. This happens at line 55, with a special case for an exponent of zero at line 46. This happens once every $2^{-126}$ random generations.

We now have the bottom of the range of real numbers we are drawing from, represented in floating point, and we know the top of the range based on our count of tail bits. We then backfill the missing mantissa bits randomly to complete the draw. We also know that between the top and the bottom of the range, the space of floating point numbers is divided uniformly, so we can do a fixed-point generation here.

Since we are drawing from the real numbers, rounding down means that we simply generate a number with tail_bits bits. Rounding up means generating a random number with that many bits and then incrementing everything by one. To round to nearest, we have to subdivide the range further, into slices that are half of a ULP in size. The lowest slice gives a tail of 0, the next two slices give a tail of 1, and so on. This allows us to exactly match the expected distribution we saw in the table above.

Benchmarks and Tests

Despite the number of loops and conditionals in this piece of code, most of them are free (on a CPU) because of how unlikely they are to be taken. The loop at line 30 has probability $2^{-53}$, and the remaining conditionals aside from the rounding mode switch are even more unlikely. This means that our average random generation takes about the same amount of time as a typical random generation.

Representative benchmark results using the published reference code are below, using Go's two random bitstream generators (PCG and ChaCha8) as their entropy source are below:

The speed here is indiscernable from the fixed-point random generation algorithm with the PCG bit generator, and we incur a penalty for using the integer side of the CPU when we use the ChaCha8 bitstream generator, which runs on the vector side. The irony of a random integer generator using the floating point side of the processor and the floating-point generator using the integer side is not lost on me.

The main failing here is that this code does not auto-vectorize the way the fixed-point generator does, but it can be done with vector operations since the loops are so rare.

We also can see that this is a better random number generator than the fixed-point version with some uniformity testing. The LSBs of the division-based fixed-point generator are clearly iscernable from random after a relatively small number of samples, with the number of non-uniform LSBs growing as more random floating point values are generated. The figure below comes from K-S tests of the uniformity of LSBs of each generator. "Round Down," "Round Up," and "Round to Nearest" are this algorithm in different modes, while "Division" is the fixed-point algorithm using division:

By 100,000,000 random numbers, the fixed-point method has 16 LSBs that are clearly discernable from random, while this algorithm generates random floating-point numbers with fully random bits. This is not as big of a deal with 64-bit floating point, but with 32-bit floating point you only have 24 bits of precision to begin with, so having only 8 bits of information that is indistinguishable from random at this point is a significant loss.

Conclusions

This is the first efficient algorithm for generating random uniform floating-point numbers that can access the entire range of floating point outputs with correct probabilities. It adds two extra steps: one to backfill bits and an unlikely recursion if we get a zero result. This algorithm solves the problem of inaccuracies that arise from floating-point random number generation, and should help make your simulations and computations more accurate.

Finally, I recently wrote a book about floating point and how to use it, with particular applications to games, graphics, and simulators. If you liked this algorithm and want to learn more about the floating point numbers, please peruse the book and consider buying a copy.