alexeev-dev notes

Math, Bits, Magic, and a Bit of Abnormal C Programming

Good day, ladies and gentlemen! Sometimes some people get the urge to engage in outright indecency in programming — things that don't bring direct practical benefit but help to have fun. And I am no exception. In this article, I want to tell you about lifehacks, tricks (magical and not so magical), and algorithms in the C language!

The idea to write this article came from my post. In it, I explained how the Fibonacci sequence can convert miles to kilometers with a small margin of error. Seeing that many people liked it, I thought: why not explore some other tricks while practicing C programming?

Everyone who's interested — please follow me under the cut.


Since I've already started talking about Fibonacci, let me quote a bit from my post:

The Fibonacci sequence can convert miles to kilometers with a small error. 5 miles ≈ 8 km (5 and 8 are Fibonacci numbers). Reality: 5 miles = 8.04672 km.
Why? 1 mile = 1.609344 kilometers (exact value). Golden ratio (φ) ≈ 1.618034
The error arises because the ratio Fₙ₊₁ / Fₙ tends toward φ ≈ 1.618034, while the exact mile/km ratio = 1.609344.
Relative error: (1.618034 — 1.609344) / 1.609344 * 100% ≈ 0.54%.

Fun fact: if you take a cube and a sphere of the same volume, the radius of the sphere in miles equals the edge of the cube in kilometers. Because the cube root of (4/3)pi is exactly 1.61. © Enfriz

Let's take a look at how to implement such a converter in C.

So, let's say we have this function:

uint64_t fibonacci(int num) {
    if (num < 0)
        return 0;
    if (num == 0)
        return 0;

    uint64_t a = 0;
    uint64_t b = 1;

    if (num == 1)
        return b;

    for (int i = 2; i <= num; i++) {
        uint64_t next = a + b;
        a = b;
        b = next;
    }

    return b;
}

I think you all know how this algorithm works. The only thing worth clarifying is that we must pass the number of miles incremented by 1 as an argument — meaning, to convert 5 miles to kilometers, we pass 5+1=6 (the next number) and get 8.

Also, we can do the calculation based on the golden ratio — since it is also connected to the Fibonacci sequence.

float fib_golden_ratio(float miles) {
    const double PHI = (1.0 + sqrt(5.0)) / 2.0;

    if (miles < 1e-5) {
        return 0.0F;
    }

    double n = log(miles * sqrt(5.0)) / log(PHI);
    int k = (int)floor(n);

    double Fk = (pow(PHI, k) - pow(-PHI, -k)) / sqrt(5.0);
    double Fk1 = (pow(PHI, k + 1) - pow(-PHI, -k - 1)) / sqrt(5.0);
    double Fk2 = (pow(PHI, k + 2) - pow(-PHI, -k - 2)) / sqrt(5.0);

    if (Fk1 - Fk < DBL_EPSILON) {
        return basic_miles2km(miles);
    }

    return Fk1 + ((miles - Fk) * ((float)(Fk2 - Fk1) / (Fk1 - Fk)));
}

Based on the number φ ((1 + √5) / 2), we can also convert miles to kilometers. The golden ratio φ ≈ 1.618 is close to 1.609, so Fibonacci numbers simulate conversion.

The variable n here is the index of the Fibonacci number corresponding to the value miles, based on the property that F(n) ≈ φ^n / √5. A logarithmic function is used to find n, and then it is rounded down to the nearest smaller integer k.

The variables Fk, Fk1, Fk2 are three consecutive Fibonacci numbers calculated using Binet's formulas for Fibonacci numbers: F(n) = (φ^n — (1 — φ)^n) / √5.

Binet's formula is an explicit formula for finding the n-th Fibonacci number without recursive computation. It is named after the French mathematician who discovered it. Binet's formula allows computing Fibonacci numbers in constant time (O(1)), whereas the recursive method can take exponential time. This makes Binet's formula very useful in computations related to Fibonacci sequences.

If the difference between F(k+1) and F(k) is less than DBL_EPSILON, it means the numbers are too close to each other, and the function simply calls basic_miles2km instead of computing a further value.

But why do we call basic_miles2km? After all, we're dealing with non-typical conversion methods. This is actually a safety mechanism. If Fk1 ≈ Fk (difference less than DBL_EPSILON), the denominator tends to zero → indeterminacy or NaN arises. Binet's formula can also work incorrectly for small n (especially n < 3).

The condition (Fk1 - Fk < DBL_EPSILON) means: "The difference between neighboring Fibonacci numbers is less than the distinguishability threshold of floating-point numbers." DBL_EPSILON is the machine epsilon for double type numbers (64 bits) from float.h, equal to 2⁻⁵² ≈ 2.20e-16. The constant DBL_EPSILON is used for comparing floating-point numbers. For example, two numbers are identical from the point of view of machine arithmetic if their absolute relative difference is less than DBL_EPSILON.

float basic_miles2km(float miles) {
    return miles * 1.609344f;
}

And at the end, interpolation is performed to find the resulting Fibonacci number: return Fk1 + ((miles - Fk) * ((float)(Fk2 - Fk1) / (Fk1 - Fk)));. If the previous conditions are not met, the function performs linear interpolation to obtain a more accurate Fibonacci number value based on the value of miles. But this function can accumulate errors at large n.

Let's move on to another similar implementation:

float fib_interpolate(float miles) {
    if (miles < 5.0F) {
        return basic_miles2km(miles);
    }

    uint64_t prev_mile = 0;
    uint64_t prev_km = 1;
    uint64_t curr_mile = 1;
    uint64_t curr_km = 2;

    while (curr_mile <= miles) {
        prev_mile = curr_mile;
        prev_km = curr_km;

        curr_mile = prev_km;
        curr_km = prev_mile + prev_km;

        if (curr_km < prev_km || curr_mile < prev_mile) {
            break;
        }
    }

    return prev_km + ((miles - prev_mile) * ((float)(curr_km - prev_km) / (curr_mile - prev_mile)));
}

The function declares variables using 64-bit unsigned integers to store mile and kilometer values, where prev_mile and prev_km store previous distances, and curr_mile and curr_km are current ones. Then the main loop begins, which will execute as long as the current mile value (curr_mile) is less than or equal to the passed miles value. On each iteration of the loop, the variable values are updated using standard Fibonacci formulas, which corresponds to generating the sequence (where the current mile value is updated to the value of the previous kilometer, and kilometers take the sum of the two previous values).

To prevent variable overflow, the loop body also contains a check: if the new value of kilometers or miles becomes less than the previous one, the loop breaks. This condition serves as protection against incorrect data and potential execution errors.

After the loop completes, interpolation is performed to obtain the kilometer value based on the previous values, using linear interpolation. This calculation allows for more accurate value finding than a simple conversion based only on Fibonacci numbers. Nevertheless, such interpolation may be insufficiently accurate for large mile values, which weakens the result accuracy depending on the input parameter range.

The time complexity of the fib_interpolate function is O(n), where n is the position in the Fibonacci sequence.

You can view all the code in more detail in the fib_miles2km repository or directly in the collection repository for this article The Art Of Fun C.

Binary Exponentiation

Since we already used pow() in Binet's formula, let's optimize it through binary exponentiation.

Let's try to import-substitute the standard pow with our algorithm, more precisely — with binary exponentiation. Complexity — O(log n).

A widely known algorithm for raising any number to an integer power with absolute precision. The operating principle is simple: there is an integer exponent e; to get the number b to this power, you need to raise this number to all powers 1, 2, 4, … 2ⁿ (in the code, this corresponds to b *= b), each time shifting the bits of e to the right (e >>= 1) until it equals 0, and when the last bit of e is not zero ((e & 1) != 0), multiply the result v by the obtained b. This method allows raising a number to an integer power in logarithmic time, which is especially important when working with large numbers.

Let's take 2⁵. Instead of 2×2×2×2×2, we represent 5 in binary form (101) and go through the bits from right to left. The first bit (1) equals 2, the second bit (0) equals 2 squared, and at the final third bit (1), we multiply the result (2) by the current value (4²=16) → 32.

Technically, on each loop pass, we check the least significant bit of the exponent e. If this bit equals 1 (e & 1 != 0), we multiply the current result v by the base b. After that, we multiply b by itself (square), which corresponds to the next power in the binary representation. By decreasing e using a right bit shift, we continue the loop until e reaches zero, as a result of which we get the final result.

double binary_pow(double b, unsigned long long e) {
    double v = 1.0;
    while(e != 0) {
            if((e & 1) != 0) {
                    v *= b;
            }
            b *= b;
            e >>= 1;
    }
    return v;
}

And upon execution, we can get 10.00 ** 2.00 = 100.00, 10.50 ** 2.00 = 110.25. But if the exponent is not an integer, this method won't work (10.50^2.50 is calculated as 110.25). So it's important to note that this algorithm is only applicable to integer numbers.

Well, here's an example of the modified fib_golden_ratio function from the previous example:

float fib_golden_ratio_binary(float miles) {
    const double PHI = (1.0 + sqrt(5.0)) / 2.0;

    if (miles < 1e-5) {
        return 0.0F;
    }

    double n = log(miles * sqrt(5.0)) / log(PHI);
    int k = (int)floor(n);

    double sign_k = (k % 2 == 0) ? 1.0 : -1.0;
    double sign_k1 = ((k+1) % 2 == 0) ? 1.0 : -1.0;
    double sign_k2 = ((k+2) % 2 == 0) ? 1.0 : -1.0;

    double phi_k = binary_pow(PHI, k);
    double phi_k1 = binary_pow(PHI, k+1);
    double phi_k2 = binary_pow(PHI, k+2);

    double Fk = (phi_k - sign_k / phi_k) / sqrt(5.0);
    double Fk1 = (phi_k1 - sign_k1 / phi_k1) / sqrt(5.0);
    double Fk2 = (phi_k2 - sign_k2 / phi_k2) / sqrt(5.0);

    if (Fk1 - Fk < DBL_EPSILON) {
        return basic_miles2km(miles);
    }

    return Fk1 + ((miles - Fk) * ((float)(Fk2 - Fk1) / (Fk1 - Fk)));
}

Xorshift Pseudorandom Number Generator

Let's move on to a no less interesting algorithm — a pseudorandom number generator. Let's try to implement our own generator of such numbers based on Xorshift. Xorshift is a family of pseudorandom number generation algorithms that use shift and exclusive OR (XOR) operations. They are simple to implement, very fast, and yield sufficiently good results for many tasks.

uint64_t xorshift64(uint64_t *state) {
    uint64_t x = *state;
    x ^= x << 13;
    x ^= x >> 7;
    x ^= x << 17;
    *state = x;
    return x;
}

This algorithm works as follows:

  1. Left shift + XOR (x << 13): "Smears" bits into higher-order positions. The XOR operation mixes the original bits with the modified ones.
  2. Right shift + XOR (x >> 7): Works as a "reverse" step — compensates for the previous shift, adding nonlinearity.
  3. Left shift + XOR (x << 17): Fixes the result and ensures complete bit mixing. The numbers 13, 7, 17 were chosen experimentally for optimal quality.

The key advantage — only 3 operations per number, which makes the algorithm incredibly fast — O(1) per number generation.

To generate the state, let's write a function to get a timestamp in microseconds:

uint64_t get_seed() {
    struct timeval tv;
    gettimeofday(&tv,NULL);
    return tv.tv_sec*(uint64_t)1000000+tv.tv_usec;
}

Additionally, for completeness, let's implement functions for generating a floating-point number and range generation:

double rand_double(uint64_t *state) {
    return (xorshift64(state) >> 11) * (1.0 / (UINT64_C(1) << 53));
}

This function returns a number in the range [0.0, 1.0). First, we generate a 64-bit number and shift it right by 11 bits. This is to discard the lower-order bits. As a result, 53 bits remain. After that, we calculate 1.0 / (UINT64_C(1) << 53) (UINT64_C(1) << 53 = 2⁵³ = 9,007,199,254,740,992.). In the end, we get the minimum step between double numbers: 1.0 / 2⁵³ ≈ 1.110223e-16. Finally, we multiply the 53 bits by this very step.

If you wondered why specifically 53 bits — in the IEEE 754 standard, the double type has a 53-bit mantissa. The structure of a double-precision number: sign — 1 bit, exponent — 11 bits, mantissa — 52+1 bits.

And generating a random number in a range:

uint64_t rand_range(uint64_t *state, uint64_t min, uint64_t max) {
    return min + xorshift64(state) % (max - min + 1);
}

This one is simpler: xorshift64(state) gives a random number. Through % (max — min + 1), we get the remainder of division by the length of the range, and add min to the final number.

And in the end, we can get the following operation results:

xorshift64 random number: 8549869788877919663 # Random number
xorshift64 random num from 10 to 100: 72 # Random number in a range
xorshift64 double random number: 0.461347 # Random fractional number

xoshiro256pp Random Number Generator

Let's move on to a more complex but interesting pseudorandom number generator. Unlike xorshift, this generator can already pass statistical tests.

Let's break down the code:

typedef struct {
    uint64_t s[4];
} xoshiro256pp_state;


static inline uint64_t rotl(const uint64_t x, int k) {
    return (x << k) | (x >> (64 - k));
}

uint64_t xoshiro256pp_next(xoshiro256pp_state *state) {
    uint64_t *s = state->s;
    uint64_t result = rotl(s[0] + s[3], 23) + s[0];

    uint64_t t = s[1] << 17;
    s[2] ^= s[0];
    s[3] ^= s[1];
    s[1] ^= s[2];
    s[0] ^= s[3];
    s[2] ^= t;
    s[3] = rotl(s[3], 45);

    return result;
}

void xoshiro256pp_init(xoshiro256pp_state *state, uint64_t seed) {
    uint64_t tmp = seed;
    for (int i = 0; i < 4; i++) {
        tmp ^= tmp >> 30;
        tmp *= 0xbf58476d1ce4e5b9;
        tmp ^= tmp >> 27;
        tmp *= 0x94d049bb133111eb;
        tmp ^= tmp >> 31;
        state->s[i] = tmp;
    }
}
  1. We create the structure xoshiro256pp_state, which contains an array s of four 64-bit integers (256 bits in total).
  2. The function rotl performs a cyclic left shift for a 64-bit number x by k positions. This means that bits that go beyond the boundaries of the numerical representation return to the beginning.
  3. The function xoshiro256pp_init is used to initialize the generator state based on a seed (you can also use the get_seed function from the previous example). A loop is used to fill the s array with four values.
  1. In the xoshiro256pp_next function, a pointer to the state structure is passed. First, we create a pointer s to the state array. Then we calculate result — this is the main generated value, which relates to the elements of the state. First, we add s[0] and s[3], then rotate the result left by 23 bits and add s[0]. In the main body of the function, several permutations and XOR operations occur. The value t is stored (s[1] shifted left by 17 bits), and then several XOR operations occur to mix the array elements so that the next state depends on the previous elements. Then we call rotl for a 45-bit rotation and return the result.

This algorithm is better than xorshift64 due to the use of 256 bits of state instead of 64. It also generates numbers with better uniform distribution. The randomness itself is more qualitative and unpredictable, which makes this algorithm quite good (for cryptographic purposes, naturally, it won't do). All this is achieved through more complex generation logic, which makes predicting numbers more problematic. It works slightly slower than xorshift64, but the result is much better.

Lehmer Random Number Generator

A very fast pseudorandom number generator. You can read more on the Wikipedia page.

The Lehmer64 algorithm is a fast pseudorandom number generator (PRNG) based on the multiplicative linear congruential method. It uses a 128-bit state to generate 64-bit random numbers, providing high performance and sufficient statistical randomness for most non-cryptographic tasks.

The multiplicative linear congruential method (multiplicative congruential method) is an algorithm for generating pseudorandom numbers based on a linear recurrence formula. In simple terms, the method creates a sequence of numbers where each subsequent number depends on the previous one, but there exists a cycle that repeats an infinite number of times (period).

Operating principle The basic formula of the method: Xₙ₊₁ = (a ⋅ Xₙ + c) mod m, where:

And here is its simplest implementation in C:

static __uint128_t g_lehmer64_state;

uint64_t lehmer64(void) {
    g_lehmer64_state *= 0xda942042e4dd58b5ULL;
    return g_lehmer64_state >> 64;
}

Every time a new number is needed, we multiply the state by the constant 0xda942042e4dd58b5ULL (calculated experimentally). g_lehmer64_state is a state of 128 bits in size, initially storing an odd seed. After that, we take the upper 64 bits and return them; the lower bits remain.

This generator is simple to implement, but according to my benchmarks, it is the slowest among the three generators in the article.

Fast Inverse Square Root from Quake III

In 2005, id Software published the source code of its 1999 game Quake III Arena under the GPL-2 license. In the file code/game/q_math.c, there is a function for calculating the inverse square root of a number, which at first glance looks like a very curious algorithm:

float Q_rsqrt( float number )
{
	long i;
	float x2, y;
	const float threehalfs = 1.5F;

	x2 = number * 0.5F;
	y  = number;
	i  = * ( long * ) &y;						// evil floating point bit level hacking
	i  = 0x5f3759df - ( i >> 1 );               // what the fuck?
	y  = * ( float * ) &i;
	y  = y * ( threehalfs - ( x2 * y * y ) );   // 1st iteration
//	y  = y * ( threehalfs - ( x2 * y * y ) );   // 2nd iteration, this can be removed

#ifndef Q3_VM
#ifdef __linux__
	assert( !isnan(y) ); // bk010122 - FPE?
#endif
#endif
	return y;
}

How does this algorithm work? It executes in two stages:

  1. Obtaining a rough approximation y of the inverse square root of the required number number:
y  = number;
i  = * ( long * ) &y;
i  = 0x5f3759df - ( i >> 1 );
y  = * ( float * ) &i;
  1. Refining the approximation using one step of the Newton-Raphson (NR) method:
const float threehalfs = 1.5F;
x2 = number * 0.5F;
y  = y * ( threehalfs - ( x2 * y * y ) );

The algorithm takes a 32-bit floating-point number (single precision in IEEE 754 format) as input data. The accuracy of the algorithm is less than 0.2% downward and never upward. This is insufficient for real numerical calculations, but sufficient for three-dimensional graphics.

You can read more about how this function works in this article and on the Wikipedia page.

For testing, you can slightly modify the code:

float Q_rsqrt(float number) {
    int32_t i;
    float x2, y;
    const float threehalfs = 1.5F;

    x2 = number * 0.5F;
    y  = number;
    i  = * ( int32_t* ) &y;
    i  = 0x5f3759df - ( i >> 1 );
    y  = * ( float* ) &i;
    y  = y * ( threehalfs - ( x2 * y * y ) );
    return y;
}

And we should get an almost correct result: Q_rsqrt(25.00) = 0.199690. For accuracy, one more Newton iteration can be added y = y * (threehalfs — (x2 * y * y));: Q_rsqrt(25.00) = 0.199999.

Conclusion

Thank you for reading the article! I hope you learned something new, or maybe some trick inspired you to explore another interesting algorithm. If you found a nuance in the article itself — write in the comments. Perhaps I will continue the series of such articles on the topic of tricks in different programming languages.

If you liked the presented material, I can suggest you subscribe to my Telegram blog. If, of course, you liked the article and want to see a bit more.

You can see examples of the code in action in my repository The Art Of Fun C.

All presented methods are more intellectual entertainment than practical solutions. But they excellently demonstrate how mathematics and low-level tricks can create non-obvious optimizations. Although, you can quite feasibly use pseudorandom number generators for simulations or for game development, or even the inverse square root from Quake III itself.

Benchmarks

In the repository, I also wrote some small benchmarks:

PRNG Performance (10,000,000 iterations):
-----------------------------------------
xorshift64:      14.18 ms  (705.37M numbers/s)
lehmer64:        20.71 ms  (482.89M numbers/s)
xoshiro256pp:    15.95 ms  (626.77M numbers/s)
-----------------------------------------

Conversion Methods Performance (each method called 10000 times per point):
----------------------------------------------------------------------
Basic                    :     0.28 ms  ( 0.001 us/call)
Fibonacci Interpolation  :     1.56 ms  ( 0.008 us/call)
Fibonacci Cache          :     1.41 ms  ( 0.007 us/call)
Golden Ratio             :    16.33 ms  ( 0.082 us/call)
Golden Ratio (Binary)    :     3.56 ms  ( 0.018 us/call)
----------------------------------------------------------------------

Accuracy Comparison (5 sample points):
Miles |   Basic   | INTERPOL |  Cache   |  Golden  | GoldenBin
------+-----------+----------+----------+----------+-----------
    5 |      8.05 |    0.58% |    0.58% |    0.58% |    0.58%
   30 |     48.28 |    0.53% |    0.53% |    0.53% |    0.53%
   55 |     88.51 |    0.55% |    0.55% |    0.55% |    0.55%
   80 |    128.75 |    0.54% |    0.54% |    0.54% |    0.54%
  100 |    160.93 |    0.54% |    0.54% |    0.54% |    0.54%
---------------------------------------------------------------

Summary:

  1. Generators
  1. Conversion (results for 200,000 calls (20 points × 10,000 iterations))

All methods have an error from 0.53 to 0.58 percent.

Sources