## April 8, 2014

### Curiosity killed the cat

I've always been amazed by early game developers. Unlike today, they had to bend and twist software to its extreme limits to squeeze all the performance they could out of elder hardware. Epic and glorious battles where illustrious unknowns fought the machine in the darkness of their basement, such as this legend. The result of these very pleasant evenings are often food for though... or blog entries.

One of such examples is the fast inverse square root algorithm found in the Quake III game. The mathematical relation $\dfrac{1}{\sqrt{x}}$ is needed to compute normalized vectors which are handy to compute light in a software renderer. There was a problem, though. Performing a floating point a square root followed by a division on a 1990-era processor was painfully slow, resulting in dreadfully low frame rates and poor player experience. Nowadays, CPUs (using SIMD instructions such as rsqrtss) or GPUs perform the inverse square function in real-time with high accuracy, making this trick obsolete. But it is so mind boggling that it is interesting to analyse.

### It's turtles all the way down

Let's examine this trick. First, an evil alien bit hack using the 0x5f3759df constant and a right shift on a floating point number (!) is performed, followed by an iteration of the Newton-Raphson method. Here is the code directly taken from the ./code/game/q_math.c source file from the 1.32b version of the Quake III source code:
As you can see, the core of the fast inverse square root trick is on line 10. A right shift, followed by a subtraction. All performed by the ALU. Way faster than floating-point epsilon-perfect computation.

That made me think. Can I come up with this kind of trick? But that would require a deep analysis of the floating point representation (IEEE 754), and, quite frankly, i'm a lazy guy. Why do it myself when I can let the computer do it?

Armed with my favorite Evolutionary Algorithm library in Python, DEAP, I embarked on the journey of finding dirty floating point representation tricks to approximate common functions. The idea is to let Genetic Programming (GP) evolve an equation using integer tricks to represent floating point operations. Two objectives were set:
1. Accuracy;
2. Total operation cost.
Conceptually quite easy to implement. But a problem arose: these integer tricks are highly dependent on precise constants that are way too hard for GP to guess. To circumvent this issue, I built a two stage evolution. While keeping the first GP stage to evolve mathematical equations, every individual evaluation would require a parameter optimization. The individuals would have terminals similar to inputs but in facts are constants that must be found by the second stage of evolution. To perform this second stage, a simpler algorithm based on Scipy's optimize module is used. This stage won't be able to solve large and difficult equations (individuals) that may produce the GP or may be stuck in local minimas, but large individuals mean costly to execute and thus not an interesting path to scrutinize.

Long story short, here is the code:
The program is roughly divided as such:
• Lines 1 - 16: Imports. You will need numpy, scipy and deap to run the example;
• Lines 18 - 36: Safe arithmetic operators. Without them, division by zeros and similar would occur during evolution;
• Lines 39 - 49: Operation execution cost. The values are more rule of thumb ideas than actual benchmarks;
• Lines 51 - 85: Helper function to show the equation in its infix notation instead of DEAP's default prefix. It only improves readability and does not affect evolution;
• Lines 88 - 103: Primitives definition. Sadly, four parameters (a, b, c, d) were defined instead of a dynamic array, but this simplification should do the trick;
• Lines 105 - 112 and 152 - 188: DEAP's usual setup and execution;
• Lines 114 - 115: The simplest way I found to reinterpret floats in integers in Python;
• Lines 117 - 128: Second stage of optimization that tries to get the best parameters (a, b, c, d) of an individual. Line 122 defines the error minimization function;
• Lines 130 - 147: First stage of optimization, GP individual handling;
• Line 149: Set of input values on which we strive to minimize the error.

### Data, reporting for duty

Once the program has run, most entries from the hall of fame (best individuals) are analogous to $(a + a) - (x >> 1)$ or other variations around constants. No other equation family were up to battle against it.

As it can be seen, we found the original equation from Quake 3. Well, almost... we clearly see a problem with GP: bloat. It found $(a + a)$ instead of $a$, which are semantically identical given that $a$ is a constant. If we simplify the equation manually, the second stage found an optimal value around 0x5f33f870, which is pretty near the original magic constant. Fine-tuning the Nelder-Mead algorithm of the second stage would give a better constant. The general look of the function is shown in the following figures.
Figure 1: Output of $a - (x >> 1)$ (blue) and $\dfrac{1}{\sqrt{x}}$ (green) for input values [0, 50].
Figure 2: General view of the relative error of the $a - (x >> 1)$ function
Figure 3: Relative error of the $a - (x >> 1)$ function with different values of a zoomed around 0x5f340000

### Facing my Waterloo

Sadly, my selfish self found no better trick than the one in the Quake III source. At least, the original™ fast inverse square root trick was found. In the next episode, we'll try to uncover other novel and unpredictable approximations to other common expressions. Stay tuned!