April 8, 2014

Math evolution and dirty tricks

Curiosity killed the cat

I've always been amazed by early game developers. Unlike today, they had to bend and twist software to its extreme limits to squeeze all the performance they could out of elder hardware. Epic and glorious battles where illustrious unknowns fought the machine in the darkness of their basement, such as this legend. The result of these very pleasant evenings are often food for though... or blog entries.

One of such examples is the fast inverse square root algorithm found in the Quake III game. The mathematical relation 1x is needed to compute normalized vectors which are handy to compute light in a software renderer. There was a problem, though. Performing a floating point a square root followed by a division on a 1990-era processor was painfully slow, resulting in dreadfully low frame rates and poor player experience. Nowadays, CPUs (using SIMD instructions such as rsqrtss) or GPUs perform the inverse square function in real-time with high accuracy, making this trick obsolete. But it is so mind boggling that it is interesting to analyse.

It's turtles all the way down

Let's examine this trick. First, an evil alien bit hack using the 0x5f3759df constant and a right shift on a floating point number (!) is performed, followed by an iteration of the Newton-Raphson method. Here is the code directly taken from the ./code/game/q_math.c source file from the 1.32b version of the Quake III source code:

float Q_rsqrt( float number )
{
long i;
float x2, y;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
y = number;
i = * ( long * ) &y; // evil floating point bit level hacking
i = 0x5f3759df - ( i >> 1 ); // what the fuck?
y = * ( float * ) &i;
y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
return y;
}
view raw quake3_rsqrt.c hosted with ❤ by GitHub
As you can see, the core of the fast inverse square root trick is on line 10. A right shift, followed by a subtraction. All performed by the ALU. Way faster than floating-point epsilon-perfect computation.

That made me think. Can I come up with this kind of trick? But that would require a deep analysis of the floating point representation (IEEE 754), and, quite frankly, i'm a lazy guy. Why do it myself when I can let the computer do it?

Armed with my favorite Evolutionary Algorithm library in Python, DEAP, I embarked on the journey of finding dirty floating point representation tricks to approximate common functions. The idea is to let Genetic Programming (GP) evolve an equation using integer tricks to represent floating point operations. Two objectives were set:
  1. Accuracy;
  2. Total operation cost.
Conceptually quite easy to implement. But a problem arose: these integer tricks are highly dependent on precise constants that are way too hard for GP to guess. To circumvent this issue, I built a two stage evolution. While keeping the first GP stage to evolve mathematical equations, every individual evaluation would require a parameter optimization. The individuals would have terminals similar to inputs but in facts are constants that must be found by the second stage of evolution. To perform this second stage, a simpler algorithm based on Scipy's optimize module is used. This stage won't be able to solve large and difficult equations (individuals) that may produce the GP or may be stuck in local minimas, but large individuals mean costly to execute and thus not an interesting path to scrutinize.

Long story short, here is the code:

# Mostly stolen from deap's symbreg GP example
import operator
import math
import random
import string
import inspect
import ctypes
import numpy
from scipy import optimize
from deap import algorithms
from deap import base
from deap import creator
from deap import tools
from deap import gp
# Define new functions
def safeIntDiv(left, right):
try:
return int(left / right)
except (ZeroDivisionError, OverflowError):
return 0
def safeIntMod(left, right):
try:
return int(left % right)
except (ZeroDivisionError, OverflowError):
return 0
def safeLShift(val):
return operator.lshift(val, 1)
def safeRShift(val):
return operator.rshift(val, 1)
# Operation cost table
OperationCost = {
operator.add.__name__: 1,
operator.sub.__name__: 1,
operator.xor.__name__: 1,
safeIntMod.__name__: 4,
operator.mul.__name__: 2,
safeIntDiv.__name__: 4,
operator.neg.__name__: 1,
safeLShift.__name__: 1,
safeRShift.__name__: 1,
}
defaultSymbolMap = {
'add': '({0} + {1})',
'sub': '({0} - {1})',
'xor': '({0} ^ {1})',
'safeIntMod': '({0} % {1})',
'mul': '({0} * {1})',
'safeIntDiv': '({0} / {1})',
'neg': '~{0}',
'safeLShift': '({0} << 1)',
'safeRShift': '({0} >> 1)',
}
def getInfix(individual, symbolMap=defaultSymbolMap, index=0):
x = individual[index]
if x.arity >= 3:
data = []
last_index = index
for _ in range(x.arity):
out, last_index = getInfix(individual, index=last_index+1)
data.append(out)
retVal = x.seq.format(*data)
elif x.arity == 2:
out_left, next_idx = getInfix(individual, index=index+1)
out_right, last_index = getInfix(individual, index=next_idx+1)
retVal = symbolMap.get(x.name, x.name).format(out_left, out_right)
elif x.arity == 1:
val, last_index = getInfix(individual, index=index+1)
retVal = symbolMap.get(x.name, x.name).format(val)
else:
retVal, last_index = x.value, index
if index == 0:
return retVal
return retVal, last_index
pset = gp.PrimitiveSet("MAIN", 5)
pset.addPrimitive(operator.add, 2)
pset.addPrimitive(operator.sub, 2)
pset.addPrimitive(operator.mul, 2)
pset.addPrimitive(safeIntMod, 2)
pset.addPrimitive(operator.xor, 2)
pset.addPrimitive(safeIntDiv, 2)
pset.addPrimitive(operator.neg, 1)
pset.addPrimitive(safeLShift, 1)
pset.addPrimitive(safeRShift, 1)
pset.renameArguments(ARG1='a')
pset.renameArguments(ARG2='b')
pset.renameArguments(ARG3='c')
pset.renameArguments(ARG4='d')
# Problem input
pset.renameArguments(ARG0='x')
creator.create("FitnessMulti", base.Fitness, weights=(-1.0, -1.0))
creator.create("Individual", gp.PrimitiveTree, fitness=creator.FitnessMulti)
toolbox = base.Toolbox()
toolbox.register("expr", gp.genHalfAndHalf, pset=pset, min_=1, max_=2)
toolbox.register("individual", tools.initIterate, creator.Individual, toolbox.expr)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)
toolbox.register("compile", gp.compile, pset=pset)
float2Int = lambda x: ctypes.cast(ctypes.pointer(ctypes.c_float(x)), ctypes.POINTER(ctypes.c_int32)).contents.value
int2Float = lambda x: ctypes.cast(ctypes.pointer(ctypes.c_int32(x)), ctypes.POINTER(ctypes.c_float)).contents.value
def evalParams(params, func, points):
# Evaluate the mean squared error between the expression
# and the real function
intparams = list(map(float2Int, params))
intrepr = lambda x: int2Float(func(float2Int(x), *intparams))
#sqerrors = ((intrepr(x) - x**(-1/2))**2 for x in points)
sqerrors = ((intrepr(x) - x**(1/2))**2 for x in points)
errsum = math.fsum(sqerrors)
if math.isnan(errsum):
errsum = len(points) * 1e+40
return errsum
def evalInvSqrt(individual, points):
# Compute the difficulty of individual
difficulty = 1
for x in individual:
difficulty += OperationCost.get(x.name, 0)
# Transform the tree expression in a callable function
func = toolbox.compile(expr=individual)
nb_args = len(inspect.getargspec(func)[0])
# Optimize the "constants"
res = optimize.fmin(evalParams,
[1 for _ in range(nb_args - 1)],
args=(func, points),
full_output=True,
disp=False)
errsum = res[1]
individual.optim_params = res[0]
return errsum / len(points), difficulty
points = [x/10. for x in range(0, 50) if x != 0]
toolbox.register("evaluate", evalInvSqrt, points=points)
toolbox.register("select", tools.selTournament, tournsize=3)
toolbox.register("mate", gp.cxOnePoint)
toolbox.register("expr_mut", gp.genFull, min_=0, max_=2)
toolbox.register("mutate", gp.mutUniform, expr=toolbox.expr_mut, pset=pset)
def main():
#random.seed(31415926535)
pop = toolbox.population(n=200)
hof = tools.HallOfFame(20)
stats_fit = tools.Statistics(lambda ind: ind.fitness.values)
stats_size = tools.Statistics(len)
mstats = tools.MultiStatistics(fitness=stats_fit, size=stats_size)
mstats.register("avg", numpy.mean)
mstats.register("std", numpy.std)
mstats.register("min", numpy.min)
mstats.register("max", numpy.max)
try:
pop, log = algorithms.eaSimple(pop, toolbox, 0.5, 0.1, 20, stats=mstats,
halloffame=hof, verbose=True)
except KeyboardInterrupt:
pass
print()
for idx, y in enumerate(hof):
err, cost = evalInvSqrt(y, points)
optim_params = {key: val for key, val in zip(string.ascii_lowercase, [hex(float2Int(x)) for x in y.optim_params])}
print("#{idx:01d}: ({err:.6f}, {cost}) {eq} {optim_params}".format(
eq=getInfix(y),
**locals()
))
return pop, log, hof
if __name__ == "__main__":
main()
The program is roughly divided as such:
  • Lines 1 - 16: Imports. You will need numpy, scipy and deap to run the example;
  • Lines 18 - 36: Safe arithmetic operators. Without them, division by zeros and similar would occur during evolution;
  • Lines 39 - 49: Operation execution cost. The values are more rule of thumb ideas than actual benchmarks;
  • Lines 51 - 85: Helper function to show the equation in its infix notation instead of DEAP's default prefix. It only improves readability and does not affect evolution;
  • Lines 88 - 103: Primitives definition. Sadly, four parameters (a, b, c, d) were defined instead of a dynamic array, but this simplification should do the trick;
  • Lines 105 - 112 and 152 - 188: DEAP's usual setup and execution;
  • Lines 114 - 115: The simplest way I found to reinterpret floats in integers in Python;
  • Lines 117 - 128: Second stage of optimization that tries to get the best parameters (a, b, c, d) of an individual. Line 122 defines the error minimization function;
  • Lines 130 - 147: First stage of optimization, GP individual handling;
  • Line 149: Set of input values on which we strive to minimize the error.

Data, reporting for duty

Once the program has run, most entries from the hall of fame (best individuals) are analogous to (a+a)(x>>1) or other variations around constants. No other equation family were up to battle against it.

As it can be seen, we found the original equation from Quake 3. Well, almost... we clearly see a problem with GP: bloat. It found (a+a) instead of a, which are semantically identical given that a is a constant. If we simplify the equation manually, the second stage found an optimal value around 0x5f33f870, which is pretty near the original magic constant. Fine-tuning the Nelder-Mead algorithm of the second stage would give a better constant. The general look of the function is shown in the following figures.
Figure 1: Output of a(x>>1) (blue) and 1x (green) for input values [0, 50].
Figure 2: General view of the relative error of the a(x>>1) function
Figure 3: Relative error of the a(x>>1) function with different values of a zoomed around 0x5f340000

Facing my Waterloo

Sadly, my selfish self found no better trick than the one in the Quake III source. At least, the original™ fast inverse square root trick was found. In the next episode, we'll try to uncover other novel and unpredictable approximations to other common expressions. Stay tuned!