tag:blogger.com,1999:blog-5699502648267550922024-03-12T19:26:46.750-04:00multigradYannick Hold-Geoffroyhttp://www.blogger.com/profile/17227249978993145288noreply@blogger.comBlogger7125tag:blogger.com,1999:blog-569950264826755092.post-81128357541734961832014-07-24T11:11:00.003-04:002014-07-28T15:25:45.633-04:00Executing x86-64 opcodes in Python - Genetic Programming case study<div style="text-align: justify;">
This post is about generating machine code directly and make it run in Python. (At least, <a href="http://nickdesaulniers.github.io/blog/2014/04/18/lets-write-some-x86-64/">i'm not the first one</a> to do something similar.) Since we've left any hope for portability by using machine code, we'll support <i>my</i> processor, which implements the (not so pretty) standard <a href="http://en.wikipedia.org/wiki/X86-64">x86-64 instruction set</a>.</div>
<h3 style="text-align: justify;">
Jumping down the abstraction layers</h3>
<div>
<div style="text-align: justify;">
First of all, how do you even execute arbitrary x86-64 instructions in Python? (Note: doing it in Python is mandatory, mainly because I want it to. And it's simpler to develop and debug.) Well, diving into the <a href="https://docs.python.org/3/library/ctypes.html">ctypes module</a> gives us half the answer: we can write a byte string that contains the executable code and then cast it into a <a href="https://docs.python.org/3/library/ctypes.html#ctypes.CFUNCTYPE">CFUNCTYPE</a> object.</div>
</div>
<div>
<div style="text-align: justify;">
<br /></div>
</div>
<div>
<div style="text-align: justify;">
But an portability issue arises. CFUNCTYPE states that it will call the function using the standard C convention (what we call an ABI)... But <a href="http://en.wikipedia.org/wiki/X86_calling_conventions">which x86 calling convention</a>? The answer is: the one that is used by the compiler with which Python was compiled. Luckily, most x86-64 calling conventions (there are two: <a href="http://msdn.microsoft.com/en-us/library/ms235286.aspx">Microsoft-flavored</a> for Windows-based systems and <a href="http://www.x86-64.org/documentation/abi.pdf">System V-flavored</a> for anything else) are pretty similar. Another lucky fact: we only have to take care of the volatile registers and the parameter passing registers and return registers.</div>
</div>
<div>
<div style="text-align: justify;">
<br /></div>
</div>
<div>
<div style="text-align: justify;">
Next, we must know the machine code that does what we want. It is worth noting that an <a href="http://gcc.godbolt.org/">interactive compiler</a> and an <a href="https://defuse.ca/online-x86-assembler.htm#disassembly">online assembler</a> are useful to generate the machine code sequence. The same output is achieved using gcc to compile a file containing a simple function as such:</div>
<div class="gistLoad" data-id="340c0d948620ce463683" id="gist-340c0d948620ce463683">
<div style="text-align: justify;">
Loading... Please enable Javascript</div>
</div>
<div style="text-align: justify;">
Note that this invocation of gcc won't execute the linker, so every call, jump or other reference to a label (an address) won't be defined - they will be set to 0. This won't hinder us as our addresses won't be the same as the sample C code anyway. To better understand what we are dealing with, it is possible to check out the opcodes in an <a href="http://ref.x86asm.net/coder.html">an x86-64 opcode reference chart</a>, or the <a href="http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf">intel architecture software developer manual</a> if the gory details are needed.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Ok, everything should be fine now! Let's try to cast a simple return. This operands only returns from a function call without doing anything.</div>
<div class="gistLoad" data-id="fbda329a1b78f1188bab" id="gist-fbda329a1b78f1188bab">
<div style="text-align: justify;">
Loading... Please enable Javascript</div>
</div>
<div style="text-align: justify;">
And... That gives a segmentation fault.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Which is pretty logical: the string is contained in a segment of the memory that have the read and write flags, but not the execution flag. Luckily, this particular problem was <a href="http://stackoverflow.com/questions/275207/python-ctypes-and-function-calls">solved on stackoverflow</a>. This gives us the following (x86-64 Linux only) code:</div>
<div class="gistLoad" data-id="d936ad17913d2636810f" id="gist-d936ad17913d2636810f">
<div style="text-align: justify;">
Loading... Please enable Javascript</div>
</div>
<div style="text-align: justify;">
At last, it works! Or, at least, doesn't explode upon execution.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
So, basically, if I want to perform an addition, <a href="https://docs.python.org/3/library/ctypes.html#ctypes.CFUNCTYPE">the documentation</a> says to interface the "C function" this way (to replace the end of the previous code snippet):</div>
</div>
<div class="gistLoad" data-id="8a5a15bbe3853a4ffeb6" id="gist-8a5a15bbe3853a4ffeb6">
<div style="text-align: justify;">
Loading... Please enable Javascript</div>
</div>
<div>
<div style="text-align: justify;">
Great, it works! But how did we come up with the machine code f2 0f 58 c1 c3 ? (the prefix \x before every byte is only to signify to Python that it's hexadecimal.) Here we go:</div>
</div>
<div class="gistLoad" data-id="8e94a42d46cc6d616ace" id="gist-8e94a42d46cc6d616ace">
<div style="text-align: justify;">
Loading... Please enable Javascript</div>
</div>
<h3 style="text-align: justify;">
The missing link</h3>
<div style="text-align: justify;">
Now that the hassle of explaining evolutionary algorithms, genetic programming and their need for fast execution <a href="http://multigrad.blogspot.ca/2014/06/fun-with-python-bytecode.html">has been done by my colleague</a>, let's push it a step further using the aforementioned notions . Marc-André brought you on the edge of the cliff; we'll now take a big step forward. While he used Python bytecode, which is an awesome and versatile idea that supports any hardware for which Python was ported, we'll use machine code.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Now that we know how we are going to execute arbitrary machine code, we can focus on the problem beforehand: generate machine code from a program tree. Let's take for example the tree for the equation <span style="text-align: center;">$ (x - 1) + 4 $ :</span></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh-Sdjp8sVPsUH0bRQCqShxsYxxWMAtlMDtDk0DH35YT9rmw5eqh1p0Wl5MqIe4N1csjo1UL4dRAO2qYxGfBGaWTcUKIsLpFGBsios50Jx6ElxAWUURBlkTDmdvLBB3CTlEoFplFNMbnCQ/s1600/GP-Tree.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh-Sdjp8sVPsUH0bRQCqShxsYxxWMAtlMDtDk0DH35YT9rmw5eqh1p0Wl5MqIe4N1csjo1UL4dRAO2qYxGfBGaWTcUKIsLpFGBsios50Jx6ElxAWUURBlkTDmdvLBB3CTlEoFplFNMbnCQ/s1600/GP-Tree.png" height="320" width="313" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
Figure 1: Example of the tree representing the expression $ (x - 1) + 4 $.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Since the processor execution is based on a stack, the easiest way to traverse this three is depth-first. Indeed, by appending every node of the tree when traversing depth-first, it generates the inverted order of execution of the tree. Here's how it works:</div>
<div style="text-align: justify;">
<div style="text-align: center;">
<br /></div>
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiJARRwYU9ODE10Iq-eSbGHEnAGXdCfiGU4N9KumYuO-5HtHhgoUogexec8MqMPeFyo5lfIJ0BX2tsbcPGhI3Xy295_FQ-6egMwo8XSasRqcMMrzRu-3cxz-skKfrT7njffm8TVlGFud_c/s1600/GP-Tree+(1).png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiJARRwYU9ODE10Iq-eSbGHEnAGXdCfiGU4N9KumYuO-5HtHhgoUogexec8MqMPeFyo5lfIJ0BX2tsbcPGhI3Xy295_FQ-6egMwo8XSasRqcMMrzRu-3cxz-skKfrT7njffm8TVlGFud_c/s1600/GP-Tree+(1).png" height="247" width="320" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
Figure 2: Example of depth-first traversing with call stack</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
As we traverse the tree, we pile up the calls or parameters on the stack. As you can see from the figure, it generates a call stack that, when executed bottom-up, is exactly the order that the x86_64 processor needs. We notice that it is different from the Python bytecode stack Marc-André showed: arguments positions are reversed and <span style="font-family: Courier New, Courier, monospace;">LOAD_GLOBAL </span>is replaced by the call to the function.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
All that seems well, but there is a problem. x86_64 calling convention passes the floating-point parameters of a function on XMM0, XMM1, XMM2 and so on. These registers are volatile, meaning that their content may well be modified when we call a function. Let's take the tree showed in the previous figure for the sake of example and assume we're dealing with floats. X will be put on XMM0, 1.0f on XMM1 and <span style="font-family: Courier New, Courier, monospace;">sub()</span><span style="font-family: inherit;"> will be called. This call will return its result on XMM0. Perfect, that's where we want our first argument for the call to </span><span style="font-family: Courier New, Courier, monospace;">add()</span><span style="font-family: inherit;">, along with 4.0f previously put in XMM1. Uh-oh. </span><span style="font-family: Courier New, Courier, monospace;">sub()</span> needs 1.0f in XMM1 while <span style="font-family: Courier New, Courier, monospace;">add()</span> needs 4.0f. This can be visualized here:</div>
<style>
.data-table {
border-collapse: collapse;
margin: 0 auto; width: 50%;
}
.border-top {
border-top: 1px solid #000;
}
.border-bottom {
border-bottom: 1px solid #000;
}
.border-left {
border-left: 1px solid #000;
}
.border-right {
border-right: 1px solid #000;
}
.centered-text {
text-align: center;
}
</style>
<br />
<table class="data-table" style="text-align: justify;">
<tbody>
<tr>
<th class="border-top border-bottom border-left border-right">Instruction</th>
<th class="border-top border-bottom border-right centered-text">XMM0</th>
<th class="border-top border-bottom border-right centered-text">XMM1</th>
</tr>
<tr>
<td class="border-bottom border-left border-right">Put 4.0f in XMM1</td>
<td class="border-bottom border-right centered-text">-</td>
<td class="border-bottom border-right centered-text">4.0f</td>
</tr>
<tr>
<td class="border-bottom border-left border-right">Put 1.0f in XMM1</td>
<td class="border-bottom border-right centered-text">-</td>
<td class="border-bottom border-right centered-text">1.0f</td>
</tr>
<tr>
<td class="border-bottom border-left border-right">Put x in XMM0</td>
<td class="border-bottom border-right centered-text">x</td>
<td class="border-bottom border-right centered-text">1.0f</td>
</tr>
<tr>
<td class="border-bottom border-left border-right">Call sub()</td>
<td class="border-bottom border-right centered-text">x - 1.0f</td>
<td class="border-bottom border-right centered-text">1.0f (?)</td>
</tr>
<tr>
<td class="border-bottom border-left border-right">Call add()</td>
<td class="border-bottom border-right centered-text">(x - 1.0f) + ?</td>
<td class="border-bottom border-right centered-text">1.0f (?)</td>
</tr>
</tbody></table>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
You'll probably say "hey, this is easy, simply put node 5 before node 2, problem solved!" Don't. Do. That. <a href="http://www.amazon.com/Haribo-Gummi-Bears-Sugar-Free/product-reviews/B000EVQWKC">All hell will break loose</a> and <a href="http://www.youtube.com/watch?v=MaYmZ5mw0DM#t=168">you will not enjoy this</a>. This kind of tweaking will lead you to hours and hours of wondering why it works in some cases but won't in such or such cases. (Says the guy who produces executable machine code in Python.) As I said earlier: this should be simple, an x86_64 CPU is based on a stack! Let's use it! We'll simply push every argument needed on the CPU stack when we traverse its node and then pop it back when it's needed. If we feel adventurous, we realize that the first argument (next node after a call) won't need to push/pop if the arguments are compatible: the result will already be in XMM0 (for floats and doubles), ready to be used. This gives us this:</div>
<table class="data-table" style="text-align: justify;">
<tbody>
<tr>
<th class="border-top border-bottom border-left border-right">Instruction</th>
<th class="border-top border-bottom border-right centered-text">XMM0</th>
<th class="border-top border-bottom border-right centered-text">XMM1</th>
</tr>
<tr>
<td class="border-bottom border-left border-right">Push 4.0f on stack</td>
<td class="border-bottom border-right centered-text">-</td>
<td class="border-bottom border-right centered-text">-</td>
</tr>
<tr>
<td class="border-bottom border-left border-right">Push 1.0f on stack</td>
<td class="border-bottom border-right centered-text">-</td>
<td class="border-bottom border-right centered-text">-</td>
</tr>
<tr>
<td class="border-bottom border-left border-right">Put x in XMM0</td>
<td class="border-bottom border-right centered-text">x</td>
<td class="border-bottom border-right centered-text">-</td>
</tr>
<tr>
<td class="border-bottom border-left border-right">Pop stack in XMM1</td>
<td class="border-bottom border-right centered-text">x</td>
<td class="border-bottom border-right centered-text">1.0f</td>
</tr>
<tr>
<td class="border-bottom border-left border-right">Call sub()</td>
<td class="border-bottom border-right centered-text">x - 1.0f</td>
<td class="border-bottom border-right centered-text">?</td>
</tr>
<tr>
<td class="border-bottom border-left border-right">Pop stack in XMM1</td>
<td class="border-bottom border-right centered-text">x - 1.0f</td>
<td class="border-bottom border-right centered-text">4.0f</td>
</tr>
<tr>
<td class="border-bottom border-left border-right">Call add()</td>
<td class="border-bottom border-right centered-text">(x - 1.0f) + 4.0f</td>
<td class="border-bottom border-right centered-text">?</td>
</tr>
</tbody></table>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
As we can see, this generates quite a lot of unnecessary pushes (for example argument 2 of <span style="font-family: Courier New, Courier, monospace;">sub()</span>). Eliminating these unnecessary stack usage is a potential optimization that we may discuss in another article.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Before rambling into a <i>non sequitur</i> madness of idea flow almost disconnected from the subject, I'll present you the symbolic regression program with individuals evaluated in x86_64 machine code using <a href="https://github.com/DEAP/deap">deap</a>. It is located <a href="https://gist.github.com/soravux/1fe0992a79fc07a23d27">here</a>. Feel free to fork it, mess with it and be curious around it.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
I haven't implemented the division (had to deal with divisions by zeros) nor cos / sin, but feel free to be <a href="http://gamedev.stackexchange.com/questions/4779/is-there-a-faster-sine-function">inspired</a> and fork the code!</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
You may be tempted to print out the machine code generated and understand it. To better understand it, it is possible to copy it in an <a href="http://onlinedisassembler.com/odaweb/">online disassembler</a> which will provide the almost human-readable assembly translation.</div>
<h3 style="text-align: justify;">
To the Infinity and beyond</h3>
<div style="text-align: justify;">
In the context of genetic programming, a way better idea than generating the machine code at each evaluation as the previous example did is to represent the individuals in deap as its machine code and evolve it. It's not that I am wary of the tortuous path toward a stable implementation of this representation. It would require the writing of a mutation and crossover function which needs pointers to mark the beginning and end of each node in the machine code representation. No, the only reason I don't dig further in this <a href="http://www.youtube.com/watch?v=FWBUl7oT9sA&feature=kp">general direction</a> is because I won't offend you by serving some old reheated matter. Marc-André already showed how it's done <a href="http://multigrad.blogspot.ca/2014/06/fun-with-python-bytecode.html">in its previous post</a>. (This has also nothing to do with the fact that it's an unmaintainable piece of code that won't ever be published as-is in a working project.)<br />
<br />
At first, this proposed idea was tagged "useless waste of time" by colleagues and friends. But as the idea developed, we realized it could be used to circumvent security features. Calling obscure opcodes, low-level functions, software interruptions or similar are now available directly in pure Python. Furthermore, it would enable the execution of dynamic libraries that are not flagged as executable. You only have to read the exported symbols of a .so library, load them in memory, apply this method and <i>voilà</i>, you can execute its functions.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
An interesting lead from this point is to make a compiler for generic Python functions. Some <a href="https://github.com/eliben/pycparser">pretty module</a> would get us near a C compiler, but I won't insult the dedication and hard work of Donald Knuth by proposing half an article written on a napkin on compiler creation. But I don't mind being familiar and even <i>tutoyer</i> Python optimizations packages. Can we perform better than <a href="http://numba.pydata.org/">Numba</a> or even <a href="http://cython.org/">Cython</a> and <a href="http://pypy.org/">PyPy</a>? Stay tuned for Part II were we'll try as-generic-as-possible Python-to-Machine-Code translation.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<h3>
</h3>
Yannick Hold-Geoffroyhttp://www.blogger.com/profile/17227249978993145288noreply@blogger.com2tag:blogger.com,1999:blog-569950264826755092.post-39960460661743807732014-06-08T13:08:00.000-04:002014-06-25T09:56:13.603-04:00Fun with Python bytecode<div align="justify">
The Python programming language is now present
everywhere. While significantly slower than other more low-level
languages like C/C++, its ease of use, its power and its various
libraries make it an ideal choice for many projects. Yet for some
cases, performance does matter. For these, many solutions have been
proposed. One of the most common is the use of an underlying,
heavily-optimized, C library along with Python bindings (numpy,
scipy, opencv, etc.). Another is the addition of performance related
features to the interpreter, like JIT or stackless mode with Pypy.
However, what to do when you can't apply neither of them, yet you
still want more performance without sacrificing the Python zen and
simplicity?</div>
<div align="justify">
<br />
One of the answers, at least in the present case,
is to roll up our sleeves, fasten our seat belts, and dive into the
mysteries of the Python bytecode.<br />
<h3>
Wait, what?</h3>
<div align="JUSTIFY">
You have probably already heard that Python, unlike
Fortran, C++ or Go, was an <i>interpreted </i><span style="font-style: normal;">language.
While not completely wrong, it is not entirely true either. You </span><i>could</i><span style="font-style: normal;">
create a Python interpreter that works in a purely interpreted way,
but it is not how it is done in most current Python implementations.
The Python code is first </span><i>compiled </i><span style="font-style: normal;">into
a series of simpler instructions. Thereafter, </span><i>these</i><span style="font-style: normal;">
instructions are actually interpreted by a virtual machine. The main
reason for that design choice is speed: directly interpreting
high-level Python code each time it is executed (think for instance
of a loop or a function executed many times) would be ineffective.
Moreover, some different Python syntax turn out to give the same
result. For instance, using a </span><i>if/elif</i><span style="font-style: normal;">
statement produces </span><i>exactly </i><span style="font-style: normal;">the
same result than a </span><i>if</i><span style="font-style: normal;">
inside the </span><i>else </i><span style="font-style: normal;">clause
of another condition. A list comprehension has the same behavior than
a traditional loop – except for some underlying details in the
current implementations, but it </span><i>could</i><span style="font-style: normal;">
be exactly the same thing. The use of a simplified, intermediate
language allows the actual interpreter to be simpler, easier to
understand and maintain, and faster, while being entirely hidden to
the user in most cases. With CPython, the most visible effect is the
creation of </span><i>*.pyc</i><span style="font-style: normal;">
files, used to store the bytecode in order to avoid a recompilation
step each time the file is executed.</span></div>
<div align="JUSTIFY">
<br />
It is to be noted that this design is not
Python-specific, since many other prominent languages use it too
(Java, Ruby, etc.). All the designs are more or less identical, the
steps involved being a) the parsing of the high-level language into
an intermediate, simpler representation, and b) the interpretation of
this intermediate representation to actually run the program. One
could note that Python makes use of a fancier design, by
translating the Python code into Abstract Syntax Tree or AST before
create the bytecode, but it is out of the scope of this post.
However, those interested by the CPython internals could refer to the
<a href="https://docs.python.org/devguide/compiler.html">Cpython compiler developers information</a>.<br />
<br /></div>
<div align="JUSTIFY">
In Cpython 3.4 (the reference interpreter), the
bytecode is based on a stack representation, and 101 different
<i>opcodes</i>. Python 2.7 uses 18 more, mainly for slicing
operations, which where separated from the item-based operations, and
<i>print</i> and <i>exec</i> statements, which became functions in
Python 3. For the rest of this post, we will use CPython 3.4. While
the explanations provided here also mostly apply to other versions
and interpreters, they may not be totally accurate.<br />
<br />
Let's take a look at an actual bytecode snippet.
Suppose you have the following simple function :</div>
<div class="gistLoad" data-id="eb58151a91ae789d4ef4" id="gist-eb58151a91ae789d4ef4">
Loading... Please enable Javascript</div>
<div align="JUSTIFY">
We could observe the bytecode generated by Python by
using the __code__ attribute :</div>
<div align="JUSTIFY" style="margin-left: 1.25cm;">
<span style="font-family: Courier, monospace;">In
[9]: myMathOperator.__code__.co_code </span>
</div>
<div align="JUSTIFY" style="margin-left: 1.25cm;">
<span style="font-family: Courier, monospace;">Out[9]:
b'|\x00\x00|\x01\x00|\x02\x00\x17d\x01\x00\x13\x14S'</span></div>
<div align="JUSTIFY">
<br />
Ok... <span style="font-style: normal;">While
expected (since the bytecode is basically a sequence of bytes), the
result is not exactly </span><i>readable</i><span style="font-style: normal;">.
Fortunately, there is a helper module in the Python library which can
render a bit more understandable code. This module is named </span><i>dis</i><span style="font-style: normal;">,
for disassembly. Let's see what it can do :</span></div>
<div align="JUSTIFY" style="font-style: normal; line-height: 100%; margin-bottom: 0cm; margin-left: 1.25cm;">
<span style="font-family: Courier, monospace;">In [12]: import dis </span>
</div>
<div align="JUSTIFY" style="font-style: normal; line-height: 100%; margin-bottom: 0cm; margin-left: 1.25cm;">
<span style="font-family: Courier, monospace;">In [13]:
print(dis.Bytecode(myMathOperator.__code__).dis()) </span>
</div>
<div align="JUSTIFY" style="font-style: normal; line-height: 100%; margin-bottom: 0cm; margin-left: 1.25cm;">
<span style="font-family: Courier, monospace;">2 0 LOAD_FAST
0 (a) </span>
</div>
<div align="JUSTIFY" style="font-style: normal; line-height: 100%; margin-bottom: 0cm; margin-left: 1.25cm;">
<span style="font-family: Courier, monospace;">3 LOAD_FAST
1 (b) </span>
</div>
<div align="JUSTIFY" style="font-style: normal; line-height: 100%; margin-bottom: 0cm; margin-left: 1.25cm;">
<span style="font-family: Courier, monospace;">6 LOAD_FAST
2 (c) </span>
</div>
<div align="JUSTIFY" style="font-style: normal; line-height: 100%; margin-bottom: 0cm; margin-left: 1.25cm;">
<span style="font-family: Courier, monospace;">9 BINARY_ADD </span>
</div>
<div align="JUSTIFY" style="font-style: normal; line-height: 100%; margin-bottom: 0cm; margin-left: 1.25cm;">
<span style="font-family: Courier, monospace;">10 LOAD_CONST
1 (2) </span>
</div>
<div align="JUSTIFY" style="font-style: normal; line-height: 100%; margin-bottom: 0cm; margin-left: 1.25cm;">
<span style="font-family: Courier, monospace;">13 BINARY_POWER </span>
</div>
<div align="JUSTIFY" style="font-style: normal; line-height: 100%; margin-bottom: 0cm; margin-left: 1.25cm;">
<span style="font-family: Courier, monospace;">14 BINARY_MULTIPLY </span>
</div>
<div align="JUSTIFY" style="font-style: normal; line-height: 100%; margin-bottom: 0cm; margin-left: 1.25cm;">
<span style="font-family: Courier, monospace;">15 RETURN_VALUE</span></div>
<div align="JUSTIFY">
<span style="font-style: normal;"><br /></span>
<span style="font-style: normal;">Well, that's
definitely better! The dis module is able to translate the bytecode
into its opcodes and their arguments, which are far more readable.
However, to find what does a bytecode do, we must place ourselves in
a </span><i>stack</i><span style="font-style: normal;"> context. Most
opcodes modify the stack by either adding or removing elements (or
both), or by changing the value of the top of stack pointer. For
instance, </span><i>LOAD_FAST</i><span style="font-style: normal;"> is
described as:</span></div>
<div align="JUSTIFY" style="margin-left: 1.25cm;">
<i><b>Pushes a reference to
the local co_varnames[var_num] onto the stack.</b></i></div>
<div align="JUSTIFY">
<span style="font-style: normal;"><br /></span>
<span style="font-style: normal;">For now, we can
forget the reference to co_varnames (we will talk about it later on),
and just retain that this opcode fetches a variable and put it onto
the stack. In our example, the stack is initially empty since it is
the beginning of the function (we will represent it as [ ]).
Supposing that the values of a, b, and c arguments are respectively
</span><i>12</i><span style="font-style: normal;">, </span><i>7</i><span style="font-style: normal;">
and </span><i>1</i><span style="font-style: normal;">, then the stack
will contain [12, 7, 1] after the first three statements execution.</span></div>
<div align="JUSTIFY">
<span style="font-style: normal;"><br /></span>
<span style="font-style: normal;">On the next line,
we reach a new opcode, </span><i>BINARY_ADD</i><span style="font-style: normal;">.
In the documentation, it is said that it:</span></div>
<div align="JUSTIFY" style="margin-left: 1.25cm;">
<i><b>Implements TOS = TOS1
+ TOS</b></i></div>
<div align="JUSTIFY" style="font-style: normal;">
<br />
Ok, that's a bit less
clear. Here, TOS means “Top Of the Stack”. So, basically, it
takes the value at the top of the stack and the second value at the
top (TOS1), add them, and push back the result of the top of the
stack. Applying this operation to our [12, 7, 1] stack, we obtain
[12, 8].</div>
<div align="JUSTIFY">
<br />
Moving on to the next opcode, we find a <i>LOAD_CONST</i><span style="font-style: normal;">.
Basically, it does the same job as </span><i>LOAD_FAST</i><span style="font-style: normal;">,
except that it loads constants and not variables (in our case, the
constant loaded is the 2 used as exponent). So our stack now contains
[12, 8, 2].</span></div>
<div align="JUSTIFY">
<span style="font-style: normal;"><br /></span>
<span style="font-style: normal;">The next opcode,
</span><i>BINARY_POWER</i><span style="font-style: normal;"> does,
according to the documentation:</span></div>
<div align="JUSTIFY" style="margin-left: 1.25cm;">
<i><b>TOS = TOS1 ** TOS</b></i></div>
<div align="JUSTIFY" style="font-style: normal;">
<br />
As for the addition, we
take the two top-most items, exponent the second by the first, and
push back the result on the stack, which now contains [12, 64].</div>
<div align="JUSTIFY" style="font-style: normal;">
<br />
The next opcode is
<i>BINARY_MULTIPLY</i>, which works similarly than <i>BINARY_POWER</i>
and <i>BINARY_ADD</i>, and our stack now contains the product of 12
and 64, that is [768]. Finally, the <i>RETURN_VALUE</i> operation is
said to:</div>
<div align="JUSTIFY" style="margin-left: 1.25cm;">
<i><b>Returns with TOS to
the caller of the function.</b></i></div>
<br />
That is, it pops the value on the top of the stack, and return it
to the calling function. In our case, the answer (768) is effectively
returned, and we're done for this example.<br />
<div align="JUSTIFY">
There are of course many other opcodes, but their
behavior is essentially the same, popping and pushing values and
references on a global stack. As one can see, interpreting (and even
writing) Python bytecode is fairly simple when we get the twist.</div>
<div align="JUSTIFY">
<br />
A last twist before the next step: some opcodes need
arguments. This is the case for <i>LOAD_FAST</i><span style="font-style: normal;">,
which needs the index of the variable to grab. This is achieved very
simply, by using up to three bytes for each operation. The first byte
is the opcode itself, and the two remaining can be used for parameter
passing. If there is no parameter to pass (like </span><i>BINARY_ADD</i><span style="font-style: normal;">),
then only one byte is used. The attentive reader would have noticed
the curious indexes in the bytecode snippet above: they precisely
mark the byte index of each opcode.</span></div>
<br />
Great, so we are now ready! Let's write bytecode instead of Python
code!<br />
<h3>
Wait, why?</h3>
<div align="JUSTIFY">
At this point, a reasonable mind might say: why in
the world would you bother to write bytecode, a limited and difficult
language, instead of actual Python code? Well, because the
compilation procedure is too slow and induces a significant overhead!
To this answer, the same reasonable mind would probably say: who
cares about the compilation time? It is not so slow (a few tenths of
seconds in most cases), and, even if it was not, this must be done
only once – since the next times, you can just reload the *.pyc and
go happily!</div>
<div align="JUSTIFY">
<br />
In most cases, the clever guy would be right. But
there's a specific use which has to be taken into account. Its name
is GP, standing for <i>Genetic Programming</i><span style="font-style: normal;">.</span></div>
<div align="JUSTIFY">
<br />
Genetic programming is an hyper-heuristics member of
the bigger family of the <i>evolutionary algorithms</i>. Basically,
it uses natural selection, mutations and other evolutionary concepts
to <i>evolve</i><span style="font-style: normal;"> trees (in the
computer sense). As any program can be represented as a tree, genetic
programming is actually able to evolve </span><i>programs</i><span style="font-style: normal;">,
that is learn, alone, the best algorithm for a given problem, like
getting out of a labyrinth, solving the Rubik Cube, etc. We do not
intend to dive into deeper explanations on GP, but the interested
reader could find introductory and advanced material about it at </span><a href="http://www.gp-field-guide.org.uk/">www.gp-field-guide.org.uk</a>.</div>
<div align="JUSTIFY">
<span style="font-style: normal;"><br /></span>
<span style="font-style: normal;">As a DEAP
contributor and user, I frequently have to work with GP and its DEAP
implementation (<a href="http://github.com/deap/deap">DEAP</a> is a generic evolutionary algorithms framework
in Python, which implements GP among others). Here, the compilation
time becomes a problem. GP uses a </span><i>population</i><span style="font-style: normal;">
of several hundreds, and even thousands of different programs. This
is mandatory to keep a sufficient level of diversity. But in order to
see the results produced by each of these programs, we must execute
them, and, as they change during the evolution, we have to compile
them each time. Consider the next figure, produced with </span><i>gprof2dot</i><span style="font-style: normal;">
and the Python </span><i>profile</i><span style="font-style: normal;">
module, which represents the execution of a typical evolution with
DEAP. One can see that the compilation time represented by the
gp.compile function, in light green, takes almost 50% of the </span><i>total</i><span style="font-style: normal;">
computation time. In other words, we spend half our computational
effort just to compile the programs!</span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi6V_0nt3WHSE84SSgfDKMAT6gJ_UW6H9IPtv3xFtjG4gLre21seNkiZpDgYZRal-6EX5K2VoCfbitW5lnE-ZFCu0blUYPIOCGedsZGLNKB16yxXeloygkb30FGri3AkO9AHzN7k3g3g5M/s1600/dot-analysis.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi6V_0nt3WHSE84SSgfDKMAT6gJ_UW6H9IPtv3xFtjG4gLre21seNkiZpDgYZRal-6EX5K2VoCfbitW5lnE-ZFCu0blUYPIOCGedsZGLNKB16yxXeloygkb30FGri3AkO9AHzN7k3g3g5M/s1600/dot-analysis.png" height="480" width="640" /></a></div>
<br />
<div align="JUSTIFY">
A few solution have been proposed to address the
problem. The compilation of the individuals with LLVM, as shown <a href="http://pyevolve.sourceforge.net/wordpress/?p=2353">here</a>, is one of them.
Since it compiles to bare x86 bytecode, it indeed reduces the
execution time, but at the price of an increase in compilation time,
the penalty being more severe if we also add compiler optimizations.
Also, the use of an optimized Python interpreter like Pypy is not of
great help with GP – it actually <i>decreases </i><span style="font-style: normal;">performance
by a factor 2. Shortly, this can be explained by the JIT design
itself: just-in-time compilers are efficient with long loop and
repetitive code segments, since the whole idea is to make up for the
compilation time by an important speedup in upcoming executions of
the same code. With GP, the code is constantly changing, so the JIT
either does nothing (because it detects that it would not be of any
use) or deteriorate the processing time by trying to compile every
individual at each evaluation. Finally, the use of an heavily
optimized C implementation of the basic functions used in the GP
program is not helpful, since Python still have to compile the high
level code </span><i>before</i><span style="font-style: normal;">
calling this C implementation!</span></div>
<div align="JUSTIFY">
<br />
That's where our bytecode hobby comes on stage. What
if we directly evolve Python bytecode? Well (spoiler alert), the
results obtained with a preliminary implementation are given in the
next figure. It should be noted that we use the exact same problem,
configuration, and random seed for our standard and optimized
implementations. The results are impressive: the compile time is now
negligible (less than 1% of the total execution time), and the
computation time is divided by more than two (47 seconds vs. 107 for
the standard implementation), without altering the results in any
way! It should be noted that this is the maximum speedup achievable,
since the standard compilation procedure took about half of the
program execution time. The small additional edge is a side effect
caused by more effective implementations of other compile related
methods<span style="font-style: normal;">.</span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiY-zq0q8YC9X51JHaglFAO3JHw0kAjP3-Qp0ODfw257IGbZ9W2sMAExnnx2x3DqU0FD6vnqKw_bDt4qnprdMXykuZVqhEaUXAda9kca6w2T-ltPqtt7DtYHGN2aFjePrHpCsSCbWZjFfo/s1600/dot-optim.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiY-zq0q8YC9X51JHaglFAO3JHw0kAjP3-Qp0ODfw257IGbZ9W2sMAExnnx2x3DqU0FD6vnqKw_bDt4qnprdMXykuZVqhEaUXAda9kca6w2T-ltPqtt7DtYHGN2aFjePrHpCsSCbWZjFfo/s1600/dot-optim.png" height="570" width="640" /></a></div>
<br />
Ok, great, so that was easy! Another problem solved!<br />
<h3>
Wait, how?</h3>
<div align="JUSTIFY">
In order to understand the intricacies of the
solution, we have to learn a bit about how GP is implemented in DEAP.
As we said earlier, GP works by evolving <i>trees</i><span style="font-style: normal;">.
Since there's no tree implementation in the Python standard library,
DEAP includes one specifically targeted on GP. It works by storing the
trees into an underlying list, along with information about how many
children each node have. For instance, the mathematical function </span><i>f(x)
= (x-1) + x*x</i><span style="font-style: normal;"> can be represented
by the following tree:</span><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEie1Xz50cGMu2mVdPNg6g8ssQH0F_Tlj3Ed4dbsWfhXR8uqJdC7f6eEB3gWCs5yGRvk-EjSs0K392N4jQT_Vmr42SKzPMl5XECTe38TX9VVaC6454M6ukYAvgYlBHX2JSzl_DCU7LD4_a8/s1600/tree-example.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEie1Xz50cGMu2mVdPNg6g8ssQH0F_Tlj3Ed4dbsWfhXR8uqJdC7f6eEB3gWCs5yGRvk-EjSs0K392N4jQT_Vmr42SKzPMl5XECTe38TX9VVaC6454M6ukYAvgYlBHX2JSzl_DCU7LD4_a8/s1600/tree-example.png" height="199" width="320" /></a></div>
<span style="font-style: normal;"><br /></span></div>
<div align="CENTER" style="font-style: normal;">
</div>
<div align="JUSTIFY" style="font-style: normal;">
and will be stored in the
following list:</div>
<div align="JUSTIFY" style="font-style: normal;">
<div style="text-align: center;">
<span style="font-family: Courier New, Courier, monospace;"><b>['add', 'subtract', 'x',
1, 'multiply', 'x', 'x']</b></span></div>
</div>
<div align="JUSTIFY">
<span style="font-style: normal;"><br /></span>
<span style="font-style: normal;">Because DEAP knows
the number of arguments of each function (or, in genetic programming
gibberish, their </span><i>arity</i><span style="font-style: normal;">),
it can reconstitute the tree. Thereafter, when comes the evaluation
step, the following procedure is used:</span></div>
<ol>
<li><div align="JUSTIFY">
<span style="font-style: normal;">The tree is
converted into its string representation. Basically, it is the
program that one would write if he wanted to produce the result of
the tree. For instance, the string representation of the previous
tree wou</span>ld be <i>add( subtract(x, 1), multiply(x, x) )</i>.</div>
</li>
<li><div align="JUSTIFY">
<span style="font-style: normal;">This string is
passed to </span><i>eval()</i><span style="font-style: normal;">.
This function is a powerful (and dangerous!) tool which allows to
execute arbitrary code from its string representation. For instance,
the following code would write “Hello world!” to the standard
output:</span></div>
<div align="JUSTIFY">
<blockquote class="tr_bq">
<i><span style="background-color: white; font-family: Courier New, Courier, monospace;">eval('print(“Hello
world!”)')</span></i></blockquote>
</div>
<div align="JUSTIFY" style="font-style: normal;">
It is in this step that
the Python code is actually compiled into its executable form.</div>
</li>
<li><div align="JUSTIFY" style="font-style: normal;">
The compiled
function is returned, so it can be evaluated.</div>
</li>
</ol>
<div align="JUSTIFY" style="font-style: normal;">
The reader would have
understood that the last step is mandatory and cannot be suppressed,
which is why we focus on the first two steps with our bytecode hack.</div>
<div align="JUSTIFY">
<br /></div>
<div align="JUSTIFY">
<span style="font-style: normal;">First, one useful
simplification: the resulting program is merely only a sequence of
calls to different functions. We never actually have to understand
what is going </span><i>inside</i><span style="font-style: normal;">
these functions. This leads to the fact that our implementation does
not restrain at all the function choice. If it works in Python, it
will work with our approach! Also, this reduce our solution
complexity, since we only have to call a few opcodes. Basically, we
will need the following:</span></div>
<ul>
<li><div align="JUSTIFY">
<i>LOAD_FAST</i><span style="font-style: normal;">
: this will be needed to access to our program arguments.</span></div>
</li>
<li><div align="JUSTIFY">
<i>LOAD_CONST</i><span style="font-style: normal;">
: similarly, this opcode is required to use constant values
(numerical or not).</span></div>
</li>
<li><div align="JUSTIFY">
<i>LOAD_GLOBAL</i><span style="font-style: normal;">
: this one will be used to retrieve references to the function
objects of our tree. As its name suggests, instead of grabbing a
symbol from the local dictionary like </span><i>LOAD_FAST</i><span style="font-style: normal;">,
it uses the global dictionary.</span></div>
</li>
<li><div align="JUSTIFY">
<i>CALL_FUNCTION</i><span style="font-style: normal;">
: obviously, this is mandatory to actually call the functions
previously loaded. It takes as parameter the number of arguments to
pass to the function. All these arguments must be on the stack at
the moment of the call, plus, under them, a reference to the
function object itself. This particular structure will prove to be
very useful hereinafter.</span></div>
</li>
<li><div align="JUSTIFY">
<i>RETURN_VALUE</i><span style="font-style: normal;">
: this one will actually be needed only one time, to return the
final result value.</span></div>
</li>
</ul>
<div align="JUSTIFY">
<span style="font-style: normal;">Before we can go on
with some code, it is the time to learn about how the functions,
constants and arguments are actually described in a code object. The
</span><i>LOAD_*</i><span style="font-style: normal;"> functions have
an almost common description:</span></div>
<div align="JUSTIFY" style="margin-left: 1.25cm;">
<i><b>Loads the
{global/local/constant} named {co_names/co_varnames/co_consts}[arg]
onto the stack</b></i></div>
<div align="JUSTIFY">
<span style="font-style: normal;"><br /></span>
<span style="font-style: normal;">Fair enough, but
what are these co_names, co_varnames or co_consts fields? Well, they
are simply tuple objects containing all the needed symbols. For
instance, if </span><i>co_names = (add, divide, subtract)</i><span style="font-style: normal;">
and that we want the subtract function, we will write </span><i>LOAD_GLOBAL
(2)</i><span style="font-style: normal;">, that is put the index of
the wanted function in the </span><i>co_names</i><span style="font-style: normal;">
tuple as operation argument. As explained in the first section, the
argument is simply the value of the two bytes following the opcode in
the bytecode.</span></div>
<div align="JUSTIFY">
<span style="font-style: normal;"><br /></span>
<span style="font-style: normal;">For the sake of
simplicity, we will consider in the following part that we already
have these tuples. Let's now see how to convert a list representation
of our tree to its bytecode representation. The important point is to
realize the similarities between the Python bytecode and our list
representation. The following figure shows it quite straightly.</span><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgPUVBZNUBUkxZacxyIkqaKQim7a8HNFmZ91NgwoNEpym4Hh04xDfGG_m6GAQ_ZVnxKolcm2fFDdGOG8ISzPBS3J2VLom_spttv1akxP-gUVi-GRRrSUE3QnNXTRQufiDnHEWsKbwN5fWg/s1600/bytecode-generation.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgPUVBZNUBUkxZacxyIkqaKQim7a8HNFmZ91NgwoNEpym4Hh04xDfGG_m6GAQ_ZVnxKolcm2fFDdGOG8ISzPBS3J2VLom_spttv1akxP-gUVi-GRRrSUE3QnNXTRQufiDnHEWsKbwN5fWg/s1600/bytecode-generation.png" height="137" width="400" /></a></div>
<span style="font-style: normal;"><br /></span></div>
<div align="JUSTIFY" style="font-style: normal;">
</div>
<div align="JUSTIFY">
<span style="font-style: normal;">As one can see, the
conversion is merely an iteration through our list with the appending
of a corresponding </span><i>LOAD_*</i><span style="font-style: normal;">
for each node, the relative order between the nodes staying the same
(for the record, this order is called </span><i>depth-first</i><span style="font-style: normal;">).
The only tricky part is to add the </span><i>CALL_FUNCTION</i><span style="font-style: normal;">
opcode after the last argument. For this purpose, we keep a list of
the number of arguments of each added node. Whenever we bump into a
terminal node (a leaf), we decrement the argument count of its parent
node. If this count gets to 0, then we know that we are done with
this function, and we add the appropriate </span><i>CALL_FUNCTION</i><span style="font-style: normal;">.
Using this algorithm, we provide a simplified conversion function.</span></div>
<div class="gistLoad" data-id="58de2508f7af2d97506c" id="gist-58de2508f7af2d97506c">
Loading... Please enable Javascript</div>
<div align="JUSTIFY" style="line-height: 120%; margin-bottom: 0.25cm;">
This
function takes a list as argument, and return its corresponding
bytecode. Note that a bytecode must be of type <i>bytes</i><span style="font-style: normal;">,
but this type is not suitable for our manipulations, because it is
non-mutable. Therefore, we use a </span><i>bytearray</i><span style="font-style: normal;">,
a mutable equivalent of </span><i>bytes</i><span style="font-style: normal;">
object. The </span><i>opcode</i><span style="font-style: normal;">
module contains various tools to assist the bytecode creation. In
this case, we use the opmap dictionary, which allows us to write the
opcode in plain text ('CALL_FUNCTION' being far more readable than
0x83).</span></div>
<div align="JUSTIFY" style="line-height: 120%; margin-bottom: 0.25cm;">
We
now have to make Python understand that he has to <i>execute</i><span style="font-style: normal;">
this bytecode. First, we must create a complete code object with it.
While the bytecode obviously plays an important role in a code
object, there are plenty of other information we have to give to
Python:</span><br />
<div class="gistLoad" data-id="4daaa23bb8a7d3d78768" id="gist-4daaa23bb8a7d3d78768">
Loading... Please enable Javascript</div>
<br />
<span style="line-height: 120%;">That's
a bunch of information! Most of these parameters are
self-explanatory, but a few deserve more explanations. The </span><i style="line-height: 120%;">stacksize</i><span style="line-height: 120%;">
parameter controls how many things can be put onto the stack at the
same time. For instance, a recursive call will add elements on the
stack each time. While its exact value is not really important, one
must take care to not exceed it, under threat of a segfault of the
Python interpreter! The </span><i style="line-height: 120%;">flags</i><span style="line-height: 120%;">
argument control various things about the way the code is handle, yet
there is not much documentation about it. The value of 67 (64+2+1)
comes from the actual value given by Python to the executable objects
in the standard GP implementation.</span></div>
<div align="JUSTIFY" style="line-height: 120%;">
<span style="line-height: 120%;"><br /></span>
<span style="line-height: 120%;">Now
that we have the code object, we must associate it with a function.
There are various ways to do it, but one of the most obvious in
Python 3 is to use the </span><i style="line-height: 120%;">types</i><span style="line-height: 120%;">
module again, but to create a function object this time:</span></div>
<div class="gistLoad" data-id="dd3a3b999adbbe78537f" id="gist-dd3a3b999adbbe78537f">
Loading... Please enable Javascript</div>
<div align="JUSTIFY" style="line-height: 120%;">
<span style="font-style: normal;">The
second argument (a dictionary) is mandatory to tell Python which
function actually corresponds to each symbol. Fortunately, its
generation was already implemented in DEAP, since the same mechanism
applies when using the eval function.</span><br />
<span style="line-height: 120%;"><br /></span>
<span style="line-height: 120%;">Another
interesting thing about this approach is that it allows for easy
genetic manipulation, like crossovers or mutations. In a few words,
these operations change the content of the tree by modifying or
exchanging branches. In the standard list implementation, a branch
replacement could be done with a simple slicing operation, since each
branch is stored in a contiguous way. Well, that rationale also
applies to our bytecode representation! In order to identify a
subtree, we just need to obtain the position of the </span><i style="line-height: 120%;">LOAD_GLOBAL</i><span style="line-height: 120%;">
of its root node, and look for the corresponding </span><i style="line-height: 120%;">CALL_FUNCTION</i><span style="line-height: 120%;">
thereafter. These two positions then give us the indices needed for
the slice construction.</span><br />
<span style="line-height: 120%;"><br /></span>
<span style="line-height: 120%;">The
following table and figure show the time required to evaluate trees
of different lengths and the relative speedup. The bytecode
implementation reaches its optimal speedup around 250 nodes, but
provides a considerable gain even for smaller lengths. The speedup
value can be easily understand if we refer to the evaluation
procedure we describe at the beginning of this section: the first two
steps that were executed in O(n) are now done in O(1). Of course, the
result computation itself has not changed, and still has, in our test
case, a O(n) complexity. Overall, in exact notation, we started from
a </span><span style="font-family: Liberation Serif, serif; line-height: 120%;">Θ</span><span style="line-height: 120%;">(3n)
complexity, down to </span><span style="font-family: Liberation Serif, serif; line-height: 120%;">Θ</span><span style="line-height: 120%;">(n+2)
with the bytecode approach, which is coherent with the observed
speedups.</span></div>
<style>
.data-table {
border-collapse: collapse;
margin: 0 auto;
width: 50%
}
.border-top {
border-top: 1px solid #000;
}
.border-bottom {
border-bottom: 1px solid #000;
}
.border-left {
border-left: 1px solid #000;
}
.border-right {
border-right: 1px solid #000;
}
.data-table td {
padding: 0.3em;
}
</style>
<br />
<table class="data-table">
<tbody>
<tr>
<th class="border-top border-bottom border-left border-right"># of nodes</th>
<th class="border-top border-bottom border-right">Standard</th>
<th class="border-top border-bottom border-right">Bytecode</th>
<th class="border-top border-bottom border-right">Speedup</th>
</tr>
<tr>
<td class="border-bottom border-left border-right">1</td>
<td class="border-bottom border-right">4,2674E-05</td>
<td class="border-bottom border-right">2,3882E-05</td>
<td class="border-bottom border-right">1,787</td>
</tr>
<tr>
<td class="border-bottom border-left border-right">3</td>
<td class="border-bottom border-right">5,9971E-05</td>
<td class="border-bottom border-right">3,0232E-05</td>
<td class="border-bottom border-right">1,984</td>
</tr>
<tr>
<td class="border-bottom border-left border-right">7</td>
<td class="border-bottom border-right">8,1913E-05</td>
<td class="border-bottom border-right">3,6364E-05</td>
<td class="border-bottom border-right">2,253</td>
</tr>
<tr>
<td class="border-bottom border-left border-right">15</td>
<td class="border-bottom border-right">1,5890E-04</td>
<td class="border-bottom border-right">7,4422E-05</td>
<td class="border-bottom border-right">2,135</td>
</tr>
<tr>
<td class="border-bottom border-left border-right">31</td>
<td class="border-bottom border-right">2,5088E-04</td>
<td class="border-bottom border-right">9,6281E-05</td>
<td class="border-bottom border-right">2,606</td>
</tr>
<tr>
<td class="border-bottom border-left border-right">63</td>
<td class="border-bottom border-right">4,5313E-04</td>
<td class="border-bottom border-right">1,5996E-04</td>
<td class="border-bottom border-right">2,833</td>
</tr>
<tr>
<td class="border-bottom border-left border-right">127</td>
<td class="border-bottom border-right">8,6494E-04</td>
<td class="border-bottom border-right">2,9974E-04</td>
<td class="border-bottom border-right">2,886</td>
</tr>
<tr>
<td class="border-bottom border-left border-right">255</td>
<td class="border-bottom border-right">1,6832E-03</td>
<td class="border-bottom border-right">5,0897E-04</td>
<td class="border-bottom border-right">3,307</td>
</tr>
<tr>
<td class="border-bottom border-left border-right">511</td>
<td class="border-bottom border-right">3,3082E-03</td>
<td class="border-bottom border-right">1,0151E-03</td>
<td class="border-bottom border-right">3,259</td>
</tr>
<tr>
<td class="border-bottom border-left border-right">1023</td>
<td class="border-bottom border-right">6,5782E-03</td>
<td class="border-bottom border-right">1,9996E-03</td>
<td class="border-bottom border-right">3,290</td>
</tr>
<tr>
<td class="border-bottom border-left border-right">2047</td>
<td class="border-bottom border-right">1,3001E-02</td>
<td class="border-bottom border-right">3,8225E-03</td>
<td class="border-bottom border-right">3,401</td>
</tr>
<tr>
<td class="border-bottom border-left border-right">4095</td>
<td class="border-bottom border-right">2,6315E-02</td>
<td class="border-bottom border-right">7,8760E-03</td>
<td class="border-bottom border-right">3,341</td>
</tr>
<tr>
<td class="border-bottom border-left border-right">8191</td>
<td class="border-bottom border-right">5,4996E-02</td>
<td class="border-bottom border-right">1,5786E-02</td>
<td class="border-bottom border-right">3,484</td>
</tr>
<tr>
<td class="border-bottom border-left border-right">16383</td>
<td class="border-bottom border-right">1,0808E-01</td>
<td class="border-bottom border-right">3,1520E-02</td>
<td class="border-bottom border-right">3,429</td>
</tr>
<tr>
<td class="border-bottom border-left border-right">32767</td>
<td class="border-bottom border-right">2,1563E-01</td>
<td class="border-bottom border-right">6,2349E-02</td>
<td class="border-bottom border-right">3,459</td>
</tr>
<tr>
<td class="border-bottom border-left border-right">65535</td>
<td class="border-bottom border-right">4,3286E-01</td>
<td class="border-bottom border-right">1,2690E-01</td>
<td class="border-bottom border-right">3,411</td>
</tr>
</tbody></table>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjOIKqzzFjYmPkCZV5gdYnPMfrwdGQYJvq_Soe1RkLz8pwygqxkmcZjc7FKG5SVXQvwLGjsHtXgrrP_KzbdGHJBFlDpA3xrh1JkUFig8wnfNwfamomIiVZf6yEfDbANfy5zWBRRBhpmse0/s1600/perf-tree-size.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjOIKqzzFjYmPkCZV5gdYnPMfrwdGQYJvq_Soe1RkLz8pwygqxkmcZjc7FKG5SVXQvwLGjsHtXgrrP_KzbdGHJBFlDpA3xrh1JkUFig8wnfNwfamomIiVZf6yEfDbANfy5zWBRRBhpmse0/s1600/perf-tree-size.png" height="408" width="640" /></a></div>
<br />
<h3>
Conclusion</h3>
<div align="JUSTIFY">
We have provided a simple way to speed up a genetic
programming evolution by directly evolving Python bytecode, without
intermediate representations. This generally divides by two the
computation time needed to perform an evolution, which is non
negligible with real world problems. This is especially important
when taking into account that as most stochastic methods,
evolutionary algorithms needs to be run at least a couple of times to
produce statistical significant results. The complete code is
available at <a href="https://github.com/mgard/deap">github.com/mgard/deap</a>. However, it should be
noted that this code includes many hacks to make it fully compliant
with DEAP API (so a lambda user does not have to worry about which
tree implementation he is using), and is still in development to
further improve performance and reliability – one funny
characteristic of our approach is that it is going so deep into
Python internals that it could actually segfault the interpreter
whenever an error occurs...<br />
<br />
If the clever mind from section 3 did not run away
already (which would probably be the reasonable thing to do instead
of reading an article describing a weird and unclean optimization
technique targeting a very narrow topic of an already narrow field
on a specific language), he might notice one interesting thing in the
second profiling figure. The evaluation itself now takes about two
thirds of the total computation time. Moreover, even basic arithmetic
functions like multiply or divide are called so often (almost 100
million times!) that they take up to 20% of the total execution time!
That's clearly where we should focus in order to further improve
performance. But how could we do it?</div>
<br />
<div align="JUSTIFY">
Well, in this post, we had fun with <i>Python</i>
bytecode. Maybe it's time to take the next step, and use a different
type of bytecode, even more low level. But that will be the subject
of another story...</div>
</div>
Anonymoushttp://www.blogger.com/profile/12152684894202279181noreply@blogger.com4tag:blogger.com,1999:blog-569950264826755092.post-10878945074775136032014-05-31T17:35:00.000-04:002014-06-06T23:00:28.858-04:00Sound decoding: A tale of visualization<h3 style="text-align: justify;">
Let's keep informed by listening to the radio</h3>
<div style="text-align: justify;">
Who isn't thrilled by mystery spy stories and espionage, challenges and ciphers to understand and decrypt? I was recently fascinated by <a href="https://archive.org/details/ird059">The Conet Project</a>, which is twenty years of recordings of unknown broadcasts on shortwave radios. The public distribution of these recording created some <a href="http://www.enigma2000.org.uk/">small</a> <a href="http://irdial.com/crackhome.htm">communities</a> striving to comprehend these messages.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Most of the recordings are numbers spoken with an text-to-speech software that could probably be used with a <a href="https://en.wikipedia.org/wiki/One-time_pad">one-time pad</a> in order to retrieve a message. But some of them hold tones which probably maps to numbers. Let's find these numbers!</div>
<h3 style="text-align: justify;">
Down the rabbit hole while riding a snake</h3>
<div style="text-align: justify;">
We'll concentrate on track 39 of disk 2 of The Conet Project, which sounds like this:</div>
<div style="text-align: center;">
<audio controls="">
<source src="https://archive.org/download/ird059/tcp_d2_39_high_pitch_polytone_irdial.mp3"></source>
To play this song, please update your ancient browser to an alternative that supports HTML5.
</audio><br />
<div style="text-align: center;">
<a href="https://archive.org/download/ird059/tcp_d2_39_high_pitch_polytone_irdial.mp3">Here is the link</a> to the audio if your browser does not support HTML5.
</div>
</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
It begins with a repetitive pattern, probably to draw the attention (and not <a href="https://en.wikipedia.org/wiki/Numbers_station#The_Atenci.C3.B3n_spy_case_evidence">¡Atención!</a>) of potential listeners. Then, at 1:30, a preamble-like pattern emerges which then makes place at 1:33 for a 1 second clock-like pattern followed by a message. The booklet of the recording states:</div>
<div style="text-align: justify;">
"High Pitch Polytone: 5 Figure groups, represented by tones, the lowest being a space. No real information on these stations, save that they may be related to M12."</div>
<div style="text-align: justify;">
M12 is the Enigma Identifier of a Russian operated number station. More information available on <a href="http://priyom.org/number-stations/morse/m12.aspx">the Priyom entry of M12</a>. Recent recordings of M12 seems to be monotonal morse, but anyways... I'll let your mind ramble on these ideas.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Fantasies aside, how can we visualize and extract the data contained in this particular track? By gleaning various <a href="https://en.wikipedia.org/wiki/Series_of_tubes">intertubes</a> references such as <a href="http://glowingpython.blogspot.ca/2011/09/uncertain-principle-and-spectrogram.html">this</a> or <a href="http://coreygoldberg.blogspot.ca/2013/06/generating-audio-spectrograms-in-python.html">this</a>, I came up with the satisfactory amplitude + <a href="https://en.wikipedia.org/wiki/Spectrogram">spectrogram</a> chart as shown in this figure:</div>
<div class="separator" style="clear: both; text-align: justify;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgfbwl9CokcrjAYkXZzmmrTK95HX_NnBsx-5_mK0uvhnolK9K2uxmcPFkx_JbR24cp9V4zmVYP_EntOA3g45LrPqjEn0Y9Ysns8-cFnXzikC0EJcaFg-x0dmt5fQTUEq9MtC-9roqwphjk/s1600/amplitude_specgram.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgfbwl9CokcrjAYkXZzmmrTK95HX_NnBsx-5_mK0uvhnolK9K2uxmcPFkx_JbR24cp9V4zmVYP_EntOA3g45LrPqjEn0Y9Ysns8-cFnXzikC0EJcaFg-x0dmt5fQTUEq9MtC-9roqwphjk/s1600/amplitude_specgram.png" height="480" width="640" /></a></div>
<div class="separator" style="clear: both; text-align: justify;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhmHELKKTp-NOsnjdW9NivKbQeimAmdITcARZkI1N1sjX_0dSlAaelyBaW-jGg-ChNShjvjQWHnJT5Uq_GeCiD4IAWZMcBKLdkOohcpJMd_0M9YRVGW1vWfLhIyBwFBFXIbgRqAi3dY61g/s1600/amplitude_specgram_zoom.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhmHELKKTp-NOsnjdW9NivKbQeimAmdITcARZkI1N1sjX_0dSlAaelyBaW-jGg-ChNShjvjQWHnJT5Uq_GeCiD4IAWZMcBKLdkOohcpJMd_0M9YRVGW1vWfLhIyBwFBFXIbgRqAi3dY61g/s1600/amplitude_specgram_zoom.png" height="480" width="640" /></a></div>
<div style="text-align: justify;">
As it can be seen from the zoomed figure, we can clearly see the tones. They are defined by their fundamental frequency, the darkest red bar at the bottom of the spectrum chart, and their <a href="https://en.wikipedia.org/wiki/Overtone">overtones</a>, the lighter red bars. These figures were obtained by this Python code:</div>
<div style="text-align: justify;">
<br /></div>
<div class="gistLoad" data-id="9f0d89d3372e2f9be928" id="gist-9f0d89d3372e2f9be928">
<div style="text-align: justify;">
Loading... Please enable Javascript</div>
</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
This code uses the <a href="http://matplotlib.org/api/mlab_api.html#matplotlib.mlab.specgram">specgram functionality</a> of matplotlib which is based on the <a href="https://en.wikipedia.org/wiki/Discrete_Fourier_transform">Discrete Fourier Transform (DFT)</a> to obtain the frequencies contained in blocks of data. I choose the blocks of data to have a fixed size of 512. Every column of the spectrogram is a Fourier transform of a 512-size block in the audio signal. This size represents 11.6 ms of the signal and allows the analysis of frequencies from 22,050 Hz (half of 44,100 Hz because of the <a href="http://en.wikipedia.org/wiki/Nyquist_frequency">Nyquist Frequency</a>) down to 43 Hz ( $ \dfrac{44100 \; \mathrm{Hz}}{2 \cdot 512} $ ). All that is well and interesting, but how can we automatically find the values represented by each tone?</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The plan is to take every column of the spectrogram, namely the Fourier transform of each block, and find their global maximum. For that matter, let's also take all the local maxima, which will give us the shape of the tone emitted. Here is how a single column looks like when the color of the spectrogram is mapped to the y-axis:</div>
<div class="separator" style="clear: both; text-align: justify;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhklW60t27S027DWyGQGqV75CQxcvRHV4ktVJ6x1a7fSGUbR1fB99lL6Ku22ncMpYOKSFiV6M4as9w1bGyQ7lYqRceieBjAEdI5Vr8uT_WhM-rcUfcBYWJc1eP9q9fEK0VpDJv5awIM0v8/s1600/FFT.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhklW60t27S027DWyGQGqV75CQxcvRHV4ktVJ6x1a7fSGUbR1fB99lL6Ku22ncMpYOKSFiV6M4as9w1bGyQ7lYqRceieBjAEdI5Vr8uT_WhM-rcUfcBYWJc1eP9q9fEK0VpDJv5awIM0v8/s1600/FFT.png" height="480" width="640" /></a></div>
<div class="separator" style="clear: both; text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The top chart shows the previous spectrogram and highlights the band at $ t = 2 \; \mathrm{s} $. The middle chart shows the Fourier transform. The bottom one shows the same transform but displayed with an logarithmic y-axis, or, as we are used to see it, enlarged to show texture. As we can see it, there is clearly a single prominent fundamental frequency around 430 Hz followed by its overtones. We can also determine that the tone was synthesized up to ~13.1 kHz and the rest seems to be noise. Interesting fact: even harmonics are much attenuated compared to the odd harmonics. This is because the tone generated is an approximation of <a href="http://en.wikipedia.org/wiki/Square_wave">square waves</a>. The code for this is available <a href="https://gist.github.com/soravux/e1355535fff3cc07c3a1">here</a>.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
So, the only thing remaining is to find are the peaks of the Fourier Transform. This can be done using the <a href="http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.signal.find_peaks_cwt.html">find_peaks_cwt</a> function of scipy which <a href="http://en.wikipedia.org/wiki/Convolution">convolves</a> a <a href="http://en.wikipedia.org/wiki/Wavelet">wavelet</a> to the signal to find the desired features. But the data points of our frequency signal have way too much space between themselves for this kind of processing. This is caused by the size of the blocks we used to compute the Fourier transform. The wider we are in the time domain, the narrower the bins are in the frequency domain, and <i>vice versa</i>. So it would be solved by invoking specgram() with a larger NFFT parameter, but another way to solve it is to <a href="http://docs.scipy.org/doc/scipy-0.13.0/reference/generated/scipy.signal.resample.html">resample</a> the frequency signal.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Let's apply this peak finding function on every block in our recording. After that, the only thing left is to merge the similar data together in order to get the beginning and the end of a tone. To do so, we'll seek for similar frequencies with an high amplitude. In a single tone signal, the most powerful harmonic is called the fundamental frequency. If the fundamental frequency in the following block is inside a certain percentage margin of the fundamental frequency of the current block, it should indeed be the same tone.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Once we get the number of tones and their frequency, we'll decode their values. We can get a pretty good idea of what we are dealing with by analyzing an histogram of the frequencies:</div>
<div class="separator" style="clear: both; text-align: justify;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEic9amCk2lZUAsnBZLM8vOCo_pfz9l3GMyrL_WdMFX1wYZjWgryBJ5x4pVSk-3BXsfPRLRo24uSQYYIUZhZ9q_Mqfmc9R4xgVTUUaLkG_T_HpGM3GHVwVIRpPu44mcRhXWn3UKzgBZvSYA/s1600/freq_histogram.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEic9amCk2lZUAsnBZLM8vOCo_pfz9l3GMyrL_WdMFX1wYZjWgryBJ5x4pVSk-3BXsfPRLRo24uSQYYIUZhZ9q_Mqfmc9R4xgVTUUaLkG_T_HpGM3GHVwVIRpPu44mcRhXWn3UKzgBZvSYA/s1600/freq_histogram.png" height="298" width="400" /></a></div>
<div style="text-align: justify;">
We can forget the outlier at 133 Hz which is the buzz or scratch at the end of the recording. We can see the four most common tones from the pattern in the first minute and half : ~305 Hz, ~400 Hz, ~340 Hz, ~422 Hz. Another interesting note: the data ranges from 300 Hz to 521 Hz. Curiously, this range gives roughly 10 symbols on the <a href="http://en.wikipedia.org/wiki/Piano_key_frequencies">equal tempered scale</a>. We can try to put the tones in bins between E♭4 and C5. This gives us the following key number sequence:</div>
<blockquote class="tr_bq">
<div style="text-align: justify;">
<span style="font-family: Courier New, Courier, monospace; font-size: x-small;">[43, 47, 45, 48, 43, 45, 43, 47, 45, 48, 43, 48, 45, 48, 43, 47, 45, 48, 43, 45, 43, 47, 45, 48, 43, 47, 45, 48, 43, 47, 45, 48, 43, 45, 43, 47, 45, 48, 43, 48, 45, 48, 43, 48, 46, 48, 43, 45, 44, 48, 45, 48, 43, 48, 45, 48, 43, 47, 48, 45, 45, 45, 48, 43, 45, 43, 48, 45, 45, 48, 43, 48, 45, 48, 43, 48, 45, 48, 43, 45, 43, 48, 45, 48, 43, 48, 45, 48, 43, 47, 45, 48, 43, 45, 43, 51, 51, 43, 43, 43, 43, 43, 43, 45, 52, 46, 47, 49, 43, 45, 52, 45, 46, 48, 43, 47, 51, 49, 46, 52, 43, 51, 48, 44, 47, 52, 43, 51, 45, 50, 46, 43, 46, 50, 47, 45, 48, 43, 46, 52, 49, 47, 43, 49, 51, 46, 43, 46, 49, 48, 45, 43, 45, 49, 50, 51, 50, 43, 45, 47, 50, 45, 48, 43, 45, 50, 52, 46, 43, 45, 49, 45, 52, 46, 44, 45, 47, 48, 52, 49, 43, 45, 48, 50, 43, 45, 47, 50, 45, 43, 48, 45, 47, 51, 43, 51, 52, 48, 44, 50, 43, 50, 45, 49, 50, 48, 43, 49, 46, 52, 46, 48, 43, 49, 52, 49, 52, 51, 43, 48, 51, 45, 48, 50, 43, 45, 47, 49, 48, 43, 45, 50, 48, 49, 43, 50, 49, 50, 48, 43, 48, 46, 52, 47, 48, 43, 47, 48, 52, 50, 52, 44, 52, 44, 52, 44, 52, 44, 52, 44, 52, 44, 52, 44, 52, 44, 52, 44, 52, 44, 52]</span></div>
</blockquote>
<div style="text-align: justify;">
This was given by <a href="https://gist.github.com/soravux/6929ed6105a9b25a51f4">the code over here</a>. Note that it is only a quick proof of concept patched together in an evening; many improvements should be made to this code. Things like enhanced robustness (adaptative constant for NFFT, for instance) or averaging the harmonics of the tone instead of taking only the harmonics footprint of the first sample may be greatly beneficial to it.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Using Lilypond, <a href="https://gist.github.com/soravux/2134c0dbc88072981544">this script</a> (sorry for the hardcoding) can generate a beautiful sheet music representation of the tones. Note length was omitted for this figure; only their pitch was considered.</div>
<div class="separator" style="clear: both; text-align: justify;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgdvmW_Srqz-4Wnt1eszF4zz-ETZfCSwAznWEmyDO3m9jlJA7XpvukbZIryxpHUMMJ5-AV8Ija5RME-HeWysCu8EoeR3MWoUvcv-UpWjghzTeJu9KMANhlyAOf8WPLr8tjkjcV6Vo0UyiM/s1600/sheet.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgdvmW_Srqz-4Wnt1eszF4zz-ETZfCSwAznWEmyDO3m9jlJA7XpvukbZIryxpHUMMJ5-AV8Ija5RME-HeWysCu8EoeR3MWoUvcv-UpWjghzTeJu9KMANhlyAOf8WPLr8tjkjcV6Vo0UyiM/s1600/sheet.png" height="640" width="612" /></a></div>
<div style="text-align: justify;">
Another representation may be of interest: text. We can convert the lowest note with a space as the booklet supposes and convert all the other notes from a to i using this simple code:</div>
<div class="gistLoad" data-id="f71ea8f453513e42b8f1" id="gist-f71ea8f453513e42b8f1">
<div style="text-align: justify;">
Loading... Please enable Javascript</div>
</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Once done so, the following text emerges:</div>
<blockquote class="tr_bq">
<div style="text-align: justify;">
dbe b dbe ebe dbe b dbe dbe dbe b dbe ebe ece baebe ebe debbbe b ebbe ebe ebe b ebe ebe dbe b hh bicdf bibce dhfci headi hbgc cgdbe cifd fhc cfeb bfghg bdgbe bgic bfbicabdeif beg bdgb ebdh hieag gbfge fcice fifih ehbeg bdfe bgef gfge ecide deigiaiaiaiaiaiaiaiaiaiai</div>
</blockquote>
<div style="text-align: justify;">
This could be a great starting point to analyze its content. We realize that the two alternating notes at 1:30 was wrongly decoded as "hh". Also, some segments were wrongly decoded as the data after 1:33 is always a group of 5 tones. You can convince yourself by listening to it using <a href="http://www.videolan.org/vlc/index.html">VLC</a> to reduce the play speed. Hence, we've got 11 false decoding out of 27. Enhancing the tone merging algorithm should do the trick for most errors, as the input is noisy and sometimes distorted. If it was not grouped as it was, text statistics could then be applied to try to understand if these are really words. If the histogram of word length matches the English corpus, this data could be considered words. But since it is always grouped by 5, its likelihood of being words is really low. If we replace the alphabetic with numerals, the following sequence is found:</div>
<blockquote class="tr_bq">
<pre style="background-color: white; border-bottom-left-radius: 0px; border-bottom-right-radius: 0px; border-top-left-radius: 0px; border-top-right-radius: 0px; border: 0px; font-size: 14px; line-height: 17.603300094604492px; padding: 0px; text-align: justify; vertical-align: baseline; white-space: pre-wrap; word-break: break-all; word-wrap: break-word;"> 425 2 425 525 425 2 425 425 425 2 425 525 535 21525 525 452225 2 5225 525 525 2 525 525 425 2 88 29346 29235 48639 85149 8273 37425 3964 683 3652 26787 24725 2793 26293124596 257 2472 5248 89517 72675 63935 69698 58257 2465 2756 7675 53945 4597919191919191919191919</pre>
</blockquote>
<div style="text-align: justify;">
We see that the last notes are a repetition of the highest and lowest notes (excluding the space delimiter). This seems like an end of transmission pattern.</div>
<h3 style="text-align: justify;">
Spy phone home</h3>
<div>
<div style="text-align: justify;">
Truth be told, signal analysis is kind of my nemesis. To take this exercise to the next level, I decoded a phone number from a telephone number signal record. The idea came from two main sources: first, a <a href="http://en.wikipedia.org/wiki/Charbonneau_Commission">commission</a> currently taking place on my local news which did not censor the dial tones of telephone clips played in public audiences. Secondly, a friend of mine criticized vigorously the poor conclusion I was trying to pull forth, which was supposed to be: "Oh, I found the sequence, but there's no known way to decrypt it!"</div>
</div>
<div>
<div style="text-align: justify;">
<br /></div>
</div>
<div>
<div style="text-align: justify;">
So I loaded this phone number signal (read: generated it <a href="http://dialabc.com/sound/generate/index.html">here</a>):</div>
</div>
<div>
<div style="text-align: justify;">
<br /></div>
</div>
<div style="text-align: center;">
<audio controls="">
<source src="http://dialabc.com/i/cache/dtmfgen/wavpcm8.300/562d4897.wav"></source>
To play this song, please update your ancient browser to an alternative that supports HTML5.
</audio><br />
<div style="text-align: center;">
<a href="http://dialabc.com/i/cache/dtmfgen/wavpcm8.300/562d4897.wav">Here is the link</a> to the audio if your browser does not support HTML5.
</div>
</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Using the aforementioned notions results into <a href="https://gist.github.com/soravux/1ce124315bc6d1f3d430">this code</a> which gives the correct decoding: <b>562-4897</b> (<a href="http://xkcd.com/221/">randomly chosen</a> number, dial it at your own risk!). One thing that merits mention: Phone signaling is (usually) done using <a href="http://en.wikipedia.org/wiki/Dual-tone_multi-frequency_signaling">Dual-Tone Multi-Frequency (DTMF)</a>, meaning that a tone containing two different frequencies (that are not a multiple of one another) is used for each keypad entry. To decode these two frequencies, I took the two most important frequencies (with the highest amplitude) and got the symbol which minimized the sum of the square of the errors. This common method is called the <a href="http://en.wikipedia.org/wiki/Least_squares">Least Squares</a>.</div>
<h3 style="text-align: justify;">
Conclusion</h3>
<div style="text-align: justify;">
Decoding signals can be daunting, but it should be a breeze with the proper tools. I am impressed to see how close we are from a polyphonic sheet extractor from an audio file. These notions could also be used to recognize the emitter of a sound or a handful of other uses. I'll let you be creative.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
But enough conspiracy theories for today. Hope you enjoyed!</div>
<div style="text-align: justify;">
Thanks to Alexandre Boily who taught me the existence of the Conet project.</div>
Yannick Hold-Geoffroyhttp://www.blogger.com/profile/17227249978993145288noreply@blogger.com0tag:blogger.com,1999:blog-569950264826755092.post-11061057623228500632014-04-08T11:04:00.002-04:002014-05-06T17:35:36.537-04:00Math evolution and dirty tricks<div style="text-align: justify;">
<h3>
Curiosity killed the cat</h3>
I've always been amazed by early game developers. Unlike today, they had to bend and twist software to its extreme limits to squeeze all the performance they could out of elder hardware. Epic and glorious battles where illustrious unknowns fought the machine in the darkness of their basement, such as <a href="http://blog.moertel.com/posts/2013-12-14-great-old-timey-game-programming-hack.html" target="_blank">this legend</a>. The result of these <a href="https://en.wikipedia.org/wiki/The_Art_of_Computer_Programming" target="_blank">very pleasant evenings</a> are often food for though... or blog entries.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
One of such examples is the <a href="https://en.wikipedia.org/wiki/Fast_inverse_square_root" target="_blank">fast inverse square root</a> algorithm found in the <a href="https://en.wikipedia.org/wiki/Quake_III_Arena" target="_blank">Quake III game</a>. The mathematical relation $ \dfrac{1}{\sqrt{x}} $ is needed to compute normalized vectors which are handy to compute light in a software renderer. There was a problem, though. Performing a floating point a square root followed by a division on a 1990-era processor was painfully slow, resulting in dreadfully low frame rates and poor player experience. Nowadays, CPUs (using SIMD instructions such as rsqrtss) or GPUs perform the inverse square function in real-time with high accuracy, making this trick obsolete. But it is so mind boggling that it is interesting to analyse.<br />
<h3>
It's turtles all the way down</h3>
</div>
<div style="text-align: justify;">
Let's examine this trick. First, an evil alien bit hack using the 0x5f3759df constant and a right shift on a floating point number (!) is performed, followed by an iteration of the <a href="https://en.wikipedia.org/wiki/Newton%27s_method" target="_blank">Newton-Raphson method</a>. Here is the code directly taken from the <span style="background-color: #eeeeee;">./code/game/q_math.c</span> source file from the 1.32b version of the <a href="ftp://ftp.idsoftware.com/idstuff/source/quake3-1.32b-source.zip" target="_blank">Quake III source code</a>:</div>
<div class="gistLoad" data-id="9963167" id="gist-9963167">
Loading ....</div>
<div style="text-align: justify;">
As you can see, the core of the fast inverse square root trick is on line 10. A right shift, followed by a subtraction. All performed by the ALU. Way faster than floating-point <a href="https://en.wikipedia.org/wiki/Machine_epsilon" target="_blank">epsilon-perfect</a> computation.</div>
<br />
<div style="text-align: justify;">
That made me think. Can I come up with this kind of trick? But that would require a deep analysis of the floating point representation (<a href="https://en.wikipedia.org/wiki/IEEE_floating_point" target="_blank">IEEE 754</a>), and, quite frankly, i'm a lazy guy. Why do it myself when I can let the computer do it?</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Armed with my favorite Evolutionary Algorithm library in Python, <a href="https://code.google.com/p/deap/" target="_blank">DEAP</a>, I embarked on the journey of finding dirty floating point representation tricks to approximate common functions. The idea is to let <a href="https://en.wikipedia.org/wiki/Genetic_programming" target="_blank">Genetic Programming</a> (GP) evolve an equation using integer tricks to represent floating point operations. Two objectives were set:</div>
<ol>
<li style="text-align: justify;">Accuracy;</li>
<li style="text-align: justify;">Total operation cost.</li>
</ol>
<div style="text-align: justify;">
Conceptually quite easy to implement. But a problem arose: these integer tricks are highly dependent on precise constants that are way too hard for GP to guess. To circumvent this issue, I built a two stage evolution. While keeping the first GP stage to evolve mathematical equations, every individual evaluation would require a parameter optimization. The individuals would have terminals similar to inputs but in facts are constants that must be found by the second stage of evolution. To perform this second stage, a simpler algorithm based on <a href="http://docs.scipy.org/doc/scipy/reference/optimize.html" target="_blank">Scipy's optimize module</a> is used. This stage won't be able to solve large and difficult equations (individuals) that may produce the GP or may be stuck in local minimas, but large individuals mean costly to execute and thus not an interesting path to scrutinize.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Long story short, here is the code:</div>
<div class="gistLoad" data-id="9673839" id="gist-9673839">
Loading ....</div>
The program is roughly divided as such:<br />
<ul>
<li>Lines 1 - 16: Imports. You will need numpy, scipy and deap to run the example;</li>
<li>Lines 18 - 36: Safe arithmetic operators. Without them, division by zeros and similar would occur during evolution;</li>
<li>Lines 39 - 49: Operation execution cost. The values are more rule of thumb ideas than actual benchmarks;</li>
<li>Lines 51 - 85: Helper function to show the equation in its <a href="https://en.wikipedia.org/wiki/Infix_notation" target="_blank">infix notation</a> instead of DEAP's default prefix. It only improves readability and does not affect evolution;</li>
<li>Lines 88 - 103: Primitives definition. Sadly, four parameters (a, b, c, d) were defined instead of a dynamic array, but this simplification should do the trick;</li>
<li>Lines 105 - 112 and 152 - 188: DEAP's usual setup and execution;</li>
<li>Lines 114 - 115: The simplest way I found to reinterpret floats in integers in Python;</li>
<li>Lines 117 - 128: Second stage of optimization that tries to get the best parameters (a, b, c, d) of an individual. Line 122 defines the error minimization function;</li>
<li>Lines 130 - 147: First stage of optimization, GP individual handling;</li>
<li>Line 149: Set of input values on which we strive to minimize the error.</li>
</ul>
<h3>
Data, reporting for duty</h3>
Once the program has run, most entries from the hall of fame (best individuals) are analogous to $ (a + a) - (x >> 1) $ or other variations around constants. No other equation family were up to battle against it.<br />
<br />
As it can be seen, we found the original equation from Quake 3. Well, almost... we clearly see a problem with GP: bloat. It found $ (a + a) $ instead of $ a $, which are semantically identical given that $ a $ is a constant. If we simplify the equation manually, the second stage found an optimal value around 0x5f33f870, which is pretty near the original magic constant. Fine-tuning the <a href="https://en.wikipedia.org/wiki/Nelder%E2%80%93Mead_method" target="_blank">Nelder-Mead algorithm</a> of the second stage would give a better constant. The general look of the function is shown in the following figures.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhl_Z25hAaRRfNHMdovqhLJ-GHtiislJ9O-xs_4N_mlLl6qe8AmF_ttiJcsoXHYTlvQwbcA6rhTuwaPEVqL_p5Id9OqRuvpZ6kQs7EzmnxWSXLDT7FeCFF0YiOVZFnCeoAlJicFXGFmi1A/s1600/functions.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhl_Z25hAaRRfNHMdovqhLJ-GHtiislJ9O-xs_4N_mlLl6qe8AmF_ttiJcsoXHYTlvQwbcA6rhTuwaPEVqL_p5Id9OqRuvpZ6kQs7EzmnxWSXLDT7FeCFF0YiOVZFnCeoAlJicFXGFmi1A/s1600/functions.png" height="300" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
Figure 1: Output of $ a - (x >> 1) $ (blue) and <span style="text-align: justify;">$ \dfrac{1}{\sqrt{x}} $ (green) for input values [0, 50].</span></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxNgCKojG-5ytoaUkM1NN5-_NbP-ri5e6tbRS8T7YgP2CdcCr5Myc5llKuQ9fnJ3nFyTzBLscalcRGY3RzuiIC_bbZLuadp7zAbfcmEJmLwtU-QFx7ujUualjgnsWJnDF6hrcSwd58ANU/s1600/precision_large.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxNgCKojG-5ytoaUkM1NN5-_NbP-ri5e6tbRS8T7YgP2CdcCr5Myc5llKuQ9fnJ3nFyTzBLscalcRGY3RzuiIC_bbZLuadp7zAbfcmEJmLwtU-QFx7ujUualjgnsWJnDF6hrcSwd58ANU/s1600/precision_large.png" height="440" width="640" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
Figure 2: General view of the relative error of the $ a - (x >> 1) $ function</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh1CdBadV4x7oEcpZ5cz8GUaPWplO3e-GFQN_4vspLpAuTSCysn4Nqne2sLMIicDBdAtYtYEwYQVtP4sxAISF3AAgVBtbtpnmhCQBw8cDyOUvK7OiEK5QFlG6iz6UQza7bZO_NIIfG0Cew/s1600/precision.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh1CdBadV4x7oEcpZ5cz8GUaPWplO3e-GFQN_4vspLpAuTSCysn4Nqne2sLMIicDBdAtYtYEwYQVtP4sxAISF3AAgVBtbtpnmhCQBw8cDyOUvK7OiEK5QFlG6iz6UQza7bZO_NIIfG0Cew/s1600/precision.png" height="366" width="640" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
Figure 3: Relative error of the $ a - (x >> 1) $ function with different values of <b>a</b> zoomed around 0x5f340000</div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<h3 style="clear: both; text-align: left;">
<span style="text-align: justify;">Facing my Waterloo</span></h3>
<div>
<span style="text-align: justify;">Sadly, my selfish self found no better trick than the one in the Quake III source. At least, the original™ fast inverse square root trick was found. In the next episode, we'll try to uncover other novel and unpredictable approximations to other common expressions. Stay tuned!</span></div>
<br />Yannick Hold-Geoffroyhttp://www.blogger.com/profile/17227249978993145288noreply@blogger.com3tag:blogger.com,1999:blog-569950264826755092.post-63714579319315168732013-09-15T22:30:00.000-04:002015-01-27T00:11:37.227-05:00Review of USB Condoms<h2 style="text-align: justify;">
We're in danger</h2>
<div style="text-align: justify;">
Computer software anomalies often have their biological analogies such as viruses or bugs. Transmission of these virtual infectious diseases by physical contact is observed since the dawn of the digital age, be it on floppy disks, optical disks or USB drives. But recently, a new trend of threat transmission targeting mobile systems began to spread. Since the mobile era, more and more devices (cellphones, music players, etc.) use their USB connector for multiple purposes such as data transfer and charge. This allows malicious computer programs to inject malware on a device when its owner only wants to charge it. Worst: one can <a href="http://krebsonsecurity.com/2011/08/beware-of-juice-jacking/" target="_blank">modify a simple charger to propagate these treats</a>. The problem? It begins to <a href="http://myandroidchief.com/simple-charger-hack-can-install-malware-on-any-ios-device/" target="_blank">spread to public and mainstream systems</a>.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The solution to these virtual physically transmitted diseases? An USB Condom! <a href="http://int3.cc/collections/frontpage/products/usbcondoms" target="_blank">Available through int3.cc</a>, this simple board only connects the power pins through while cutting the data pins. Simple and clever. But can we presumptuously remove half the USB 2.0 wires without side effects?</div>
<h2 style="text-align: justify;">
The current state of affairs</h2>
<div class="separator" style="clear: both; text-align: center;">
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="http://int3.cc/collections/frontpage/products/usbcondoms" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="http://cdn.shopify.com/s/files/1/0244/5107/products/usb-condom-1169_project-body_1024x1024.jpg" height="213" width="320" /></a></div>
<br />
<div style="text-align: justify;">
The USB connectors are defined with 4 (2.0) or 10 (3.0) connections which always contain two power pins (5V and ground) in addition to their data pins.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Let's read some power specifications directly from the ultimate reference, the Kamasutra of USB, <a href="http://www.usb.org/developers/docs/" target="_blank">the official USB 2.0 standard documentation</a>, pages 171 and 245:</div>
<blockquote class="tr_bq">
<div style="text-align: justify;">
<span style="font-family: Times, Times New Roman, serif;"><i>Depending on the power capabilities of the port to which the device is attached, a USB device </i></span><span style="font-family: Times, Times New Roman, serif;"><i>may be able to draw up to five unit loads from V BUS after configuration.</i> <i>A unit load is defined to be 100 mA.</i></span></div>
</blockquote>
<div style="text-align: justify;">
It is also written on page 178 of the USB 2.0 specifications document that a device described as high-power should require a maximum of 500 mA on the 5 V power rail of the USB connector after being configured and 100 mA maximum during power-up (page 171).</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The USB 3.1 specification rise the high-power current draw to a maximum of 900 mA (6 unit loads of 150 mA) when the client device is configured but states, on chapter 11.4 of its specifications:</div>
<blockquote class="tr_bq">
<blockquote class="tr_bq">
<div style="text-align: justify;">
<i>Note that a USB 3.1 peripheral device shall not draw more than 100 mA until it detects far-end Rx terminations in the unconfigured state.</i></div>
</blockquote>
</blockquote>
<div style="text-align: justify;">
Finally, - the last quote from the specifications, I promise - the <a href="http://www.usb.org/developers/devclass_docs" target="_blank">battery charging specification</a> allows (in its revision 1.2 from December 2010) to draw up to a whooping 1500 mA from an USB port of a compliant device for charging purposes without the need to connect the data pins (after a delay of maximum 900 ms). But this amazing power is only available through hardware modifications of the original USB host specification, which most computers and laptops don't implement.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
It should be noted that many manufacturers of portable devices and adapters do not follow these specifications. Most wall adapters can deliver <a href="http://store.apple.com/ca/product/MD836LL/A/apple-12w-usb-power-adapter" target="_blank">more power</a> than any standard specification, even the battery charging specification. For example, the <a href="http://www.hardkernel.com/renewal_2011/products/prdt_info.php?g_code=G135341370451" target="_blank">ODROID-U2</a> uses an <a href="http://en.wikipedia.org/wiki/EIAJ_connector" target="_blank">EIAJ-01</a>-like connector to transmit 10 Watts! But who will condemn this practice - aside the firefighters - when it means faster charge times?</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
With this kind of devices, the USB Condom will do a good job protecting your beloved electronics from electronic STDs while allowing them to charge freely.</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
But you won't have this luck when charging from a computer, a laptop or any other power delivering device that is more likely to follow specifications. The USB enumeration process, which uses the data pins, will likely fail when using the USB Condom. The device providing power will not be able to read the configuration registers stating the high power requirement of the charging device. This will produce a maximum charge of 100 mA, which is the maximum allowed current during power-up (unconfigured state) and which coincidentally can be near or under the device power consumption. Result? An awfully slow charge or even no charge at all!</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
This behavior particularly hinders the utility of the USB Condom since its goal is to charge your portable devices while protecting it from malicious communications which are mainly from computers - which may have viruses - or a fraudulent charger containing an embedded microcontroller delivering malwares. Using the USB Condom will result in a slow or no charge from these sources, depending on how much the pirate manufacturer of the charger respected the specifications. It's a shame; when you put a condom, it should draw some satisfaction!</div>
<h2 style="text-align: justify;">
There shall be light</h2>
<div>
<div style="text-align: justify;">
One way to circumvent this problem would be to include a <a href="http://www.maximintegrated.com/datasheet/index.mvp/id/7968" target="_blank">USB charger detector IC</a> or a simple microcontroller with an USB stack such as the ones available from <a href="http://www.microchip.com/pagehandler/en-us/technology/usb/products/devicemcus.html" target="_blank">Microchip</a>. While keeping the power connectors linked to the protected device, its data pins would be replaced by the microcontroller's pins that would configure as a generic high-power device while ignoring any communication attempt. Sadly, this will boost the manufacturing price of the USB Condoms. But it will allow the charging sequence to respect the original USB specifications. This new device could even capture every malware injection attempt and create an Virtual STD <a href="http://en.wikipedia.org/wiki/Honeypot_(computing)" target="_blank">Honeypot</a>! What an appetizing idea!</div>
</div>
Yannick Hold-Geoffroyhttp://www.blogger.com/profile/17227249978993145288noreply@blogger.com1tag:blogger.com,1999:blog-569950264826755092.post-74655687127918377162013-09-02T14:42:00.001-04:002014-05-06T17:28:36.796-04:00How to lazily entertain a cat (Part 1)<h2>
Laser-powered Cat Entertainment System (LCES)</h2>
Cats love lasers.<br />
<br />
So let's build a Laser-powered Cat Entertainment System (LCES). It's a fancy name for a laser that shakes autonomously. I know this has already been done a <a href="http://www.instructables.com/id/CatBot-Automated-Cat-Laser/?ALLSTEPS" target="_blank">million</a> <a href="http://www.ragingcomputer.com/2012/04/arduino-laser-turret-cat-toy" target="_blank">times</a> and is even <a href="http://www.amazon.com/FroliCat-DART-Automatic-Rotating-Laser/dp/B003Q6AG9K" target="_blank">manufactured</a> and <a href="http://www.petco.com/product/109349/FroliCat-BOLT-Laser-Pet-Toy.aspx" target="_blank">sold</a>, but, hey! The fun is in the journey and not the destination!<br />
<br />
This project is split in two parts. The first section will cover the physical system <i>per se</i> while the second one will discuss interactivity using a webcam.<br />
<br />
For part 1, you will need:<br />
<ul>
<li>A Laser;</li>
<li>Two servomotors;</li>
<li>A servo controller;</li>
</ul>
<h4>
Laser</h4>
I used <a href="http://dx.com/p/6mm-5mw-650nm-red-laser-diode-dc-5v-150906" target="_blank">a cheap diode and resistor combo from DealExtreme.com</a>, but any diode would do. Disassembling one you found at the flea market is another good alternative. While playing with it, remember that it is a <a href="http://en.wikipedia.org/wiki/Diode" target="_blank">diode</a><span style="font-weight: normal;">, which means that they <b>must</b> be supplied in current. Applying a voltage source to it will cause it to <b>burn</b>. How can I create a current supply, you ask? For low power applications, using a resistor in series with the diode would kick in its <a href="http://en.wikipedia.org/wiki/Ohm's_law" target="_blank">$ V= R·I $</a>, effectively limiting the current in the diode to $ \frac{V_{supply} - V_{diode}}{R} $, if you are interested by school stuff. That won't work on higher power application because the resistor's resistance varies with its temperature.</span>
<br />
<br />
Be careful to use a low power laser because you don't want to burn your cat's retina!
<br />
<h4>
Servomotors</h4>
<div>
For the servomotors, I've got my hands on two <a href="http://www.futaba-rc.com/servos/analog.html" target="_blank">Futaba S3004</a>, but <a href="http://popular.ebay.com/misc-n-z/servo-motors.htm" target="_blank">any</a> <a href="http://www.hobbypartz.com/servos.html" target="_blank">servomotor</a> <a href="http://www.robotshop.com/servo-motors.html" target="_blank">will</a> <a href="http://dx.com/s/servo" target="_blank">do</a>. You will want one with a disc horn large enough to support the second servo and the second one with arms, X or star-shaped. What are the differences? Check Figure 1.</div>
<div>
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhahqPSO_FsxUqpdhhMaiIEOkgW_Te6lyu2SC0yTec-gQrNFadjqCjpkkoBRH5A7uBwrexecWlQyvs_dYh7jaOUNY_jwDQ4LGtkwQnNnPtM6UuyxZSvQ0rnBRz-RMcSuGsQkwkdO1czdd8/s1600/20130902-_DSC0908-Modifier.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhahqPSO_FsxUqpdhhMaiIEOkgW_Te6lyu2SC0yTec-gQrNFadjqCjpkkoBRH5A7uBwrexecWlQyvs_dYh7jaOUNY_jwDQ4LGtkwQnNnPtM6UuyxZSvQ0rnBRz-RMcSuGsQkwkdO1czdd8/s1600/20130902-_DSC0908-Modifier.jpg" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><span style="font-size: small;">Figure 1: Types of horns. Sorry, no arm because I didn't had one in hand. Just think of it as an X with two opposites branches cut off.</span></td></tr>
</tbody></table>
<h4>
Servo controller</h4>
<div>
As for the servo controller, I will use a <a href="http://www.pololu.com/catalog/product/1350" target="_blank">Pololu Micro Maestro 6</a>, which can be seen in Figure 2.</div>
<div>
<br /></div>
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiT5tdiqh4bIUYAMPAi1YRB3s_6wpm8vGqtd1Pj-EwglezBsr0hIkUwYQBG_6wdhbiStOpSHcyYSyJpe6Bs9Ss1ufDaYiYS0ix4nrqDeZbhmY9iP1b8xWIclnpIoA0dhS3ycvXOU7K7fs4/s1600/20130902-_DSC0912-Modifier-2.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiT5tdiqh4bIUYAMPAi1YRB3s_6wpm8vGqtd1Pj-EwglezBsr0hIkUwYQBG_6wdhbiStOpSHcyYSyJpe6Bs9Ss1ufDaYiYS0ix4nrqDeZbhmY9iP1b8xWIclnpIoA0dhS3ycvXOU7K7fs4/s1600/20130902-_DSC0912-Modifier-2.jpg" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;"><br /></td></tr>
</tbody></table>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhntygeEdVJCqZx_AbK_gtp8jEoxMobbn3juwYvdjq0L2tsLBjkDVLcIq-MEcZo_-xVm6VsW4W_3mMRQGehuCtXxC2We1zw0Nk3yT-9PJg66UXp1IuqRNiPEaKE97IWZiogU-lhyAyVb-E/s1600/20130902-_DSC0917-Modifier-Modifier.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhntygeEdVJCqZx_AbK_gtp8jEoxMobbn3juwYvdjq0L2tsLBjkDVLcIq-MEcZo_-xVm6VsW4W_3mMRQGehuCtXxC2We1zw0Nk3yT-9PJg66UXp1IuqRNiPEaKE97IWZiogU-lhyAyVb-E/s1600/20130902-_DSC0917-Modifier-Modifier.jpg" /></a></div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
The red wire is a bypass to power the servos using the 5V from the USB.</div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
Using an already made servo controller is the simplest solution but you can always </div>
<h4>
Hardware recap</h4>
<div>
<ul>
<li><span style="font-family: Courier New, Courier, monospace;">[~2$ ]</span> 1x Laser - Such as <a href="http://dx.com/p/6mm-5mw-red-laser-module-3-5-4-5v-13378" target="_blank">this one</a>;</li>
<li><span style="font-family: Courier New, Courier, monospace;">[~6$ ]</span> 2x Servomotor - Such as the <a href="http://dx.com/p/17g-plastic-pom-gear-analog-servo-for-rc-model-toy-black-dc-4-8-6v-201994" target="_blank">Tower Pro SG-5010</a>;</li>
<li><span style="font-family: Courier New, Courier, monospace;">[~20$]</span><span style="font-family: inherit;"> 1x Servo controller - Such as the <a href="http://www.robotshop.com/productinfo.aspx?pc=RB-Pol-101&lang=en-US" target="_blank">Pololu Micro Maestro 6</a>;</span></li>
</ul>
<div>
Your mileage may vary on prices, be sure to browse a bit before purchasing to find better or cheaper parts.</div>
</div>
<h4>
Design</h4>
<div>
The assembly is quite simple:</div>
<div>
<ol>
<li>Put a servo perpendicularly on the other one;</li>
<li>Fix the laser on the topmost servo;</li>
<li>Plug everything together.</li>
<ol>
<li>Servos on the Controller;</li>
<li>Laser and servos power to the USB power (available on the controller);</li>
<li>Controller to the computer.</li>
</ol>
</ol>
</div>
<div>
This is the look of the final design once you've plugged everything together.</div>
<div>
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_9FAG8K66IW7VTVeLrkJW2T9zu1Go4-R7zX5xEoOOq7CkxCWHheHO9RAZGidLoqzPkmhLdJc4vC5rkPdMlIehs2WOerqMSxOVb3IBy_N3hfA_HhsybNzuXwWasfV5beyjd3vRDaZmV2I/s1600/20130902-_DSC0920.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi_9FAG8K66IW7VTVeLrkJW2T9zu1Go4-R7zX5xEoOOq7CkxCWHheHO9RAZGidLoqzPkmhLdJc4vC5rkPdMlIehs2WOerqMSxOVb3IBy_N3hfA_HhsybNzuXwWasfV5beyjd3vRDaZmV2I/s320/20130902-_DSC0920.jpg" height="270" width="320" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjM4PUeN34gKA83PENvNL1O5DEybT38OqIAy0Lb8SjQVVMD4uTW6OpP6-AnUStn3T6JBfu-f-cPDAcf7ga5BW0UbhQn2YYJG5bkHg0Z4fIcHIm8PX3LJvdTmaSkZ4ixVHztvx95z-VAbjM/s1600/20130902-_DSC0919.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjM4PUeN34gKA83PENvNL1O5DEybT38OqIAy0Lb8SjQVVMD4uTW6OpP6-AnUStn3T6JBfu-f-cPDAcf7ga5BW0UbhQn2YYJG5bkHg0Z4fIcHIm8PX3LJvdTmaSkZ4ixVHztvx95z-VAbjM/s320/20130902-_DSC0919.jpg" height="156" width="320" /></a></div>
<div>
<br /></div>
<h4>
Software</h4>
<div>
Now that the hardware parts are all ready, we can start coding. Here is a small example of randomly waving the laser around using Python.<br />
<br />
You will need to install pyserial, first. In fact, I recommend you to create a virtualenv with Python 3.3, install <i>setuptools</i> and <i>pip</i> and then execute <i>pip install pyserial</i>. This works on Linux, Windows and Mac.<br />
Make sure you've configured correctly your Pololu using the Maestro Control Center (provided with <a href="http://www.pololu.com/catalog/product/1350/resources" target="_blank">the driver</a>). This means activating either the USB Dual Port or USB Chain mode, setting the channels properly (I recommend putting every non-used channel to Input for safety) and setting the Mini SSC offset and Timeout to zero. The other options are optional.<br />
<br />
So here's the code:</div>
<div class="gistLoad" data-id="6414925" id="gist-6414925">
Loading ....<br />
<br /></div>
And there you go! Next up on Part 2 we'll see how to hook up this setup to a webcam using OpenCV to analyze movement and create organic-like movement. Stay tuned!<br />
<br />
I shamelessly offer you a couple of photos of my cat:<br />
<table border="0" bordercolor="transparent" cellpadding="3" cellspacing="3" style="background-color: transparent; margin: 0 auto;">
<tbody>
<tr>
<td><div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxU0ebjfl5ZKJzESdYGz5I7CdTMahtVFwfflEOJNDVv5Wm78n6HSsEJNK23CBxEpewQsk4xceayA0ZcHjDDnxbQDFYg0_QSkQ4Zs4924w4cSfpa_TaLCKCLe97MeWQBkDHm7ZvCXBfmG8/s1600/20130902-_DSC0926.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxU0ebjfl5ZKJzESdYGz5I7CdTMahtVFwfflEOJNDVv5Wm78n6HSsEJNK23CBxEpewQsk4xceayA0ZcHjDDnxbQDFYg0_QSkQ4Zs4924w4cSfpa_TaLCKCLe97MeWQBkDHm7ZvCXBfmG8/s320/20130902-_DSC0926.jpg" height="320" width="171" /></a></div>
</td>
<td><div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhAzVaUJHWUB9Lp0I3oodtQPXh8nIxxPtmxMW8pCTMGeiDLktwWS1XeNaRhXH_8wN7KnJZ6G9Ok8ZanmEvRpRRVKT_igvE7QIV1f4kWTYg_Ko3Obe_QFJp1TyB7kzOPQRzgNpDUF_vjfdE/s1600/20130902-_DSC0929.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhAzVaUJHWUB9Lp0I3oodtQPXh8nIxxPtmxMW8pCTMGeiDLktwWS1XeNaRhXH_8wN7KnJZ6G9Ok8ZanmEvRpRRVKT_igvE7QIV1f4kWTYg_Ko3Obe_QFJp1TyB7kzOPQRzgNpDUF_vjfdE/s320/20130902-_DSC0929.jpg" height="186" width="320" /></a></div>
</td>
</tr>
<tr>
<td><div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh38rFIOjRwmtn_80zwiFQxd76upPnJt7mvwx0ZGOP4xd8Zvqaqmjf7NhEct7wrPlAGJlsu0PaVKiW7EsiPwnoyH7JiMqTbd65BJjphvXj4dimgVIIx0yYn2ys1p_VlsFja41Nfayx2LvM/s1600/20130902-_DSC0930.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh38rFIOjRwmtn_80zwiFQxd76upPnJt7mvwx0ZGOP4xd8Zvqaqmjf7NhEct7wrPlAGJlsu0PaVKiW7EsiPwnoyH7JiMqTbd65BJjphvXj4dimgVIIx0yYn2ys1p_VlsFja41Nfayx2LvM/s320/20130902-_DSC0930.jpg" height="320" width="206" /></a></div>
</td>
<td><div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj1GRQKB43IAbOWo0T6n96vBdf8Z_lkID-_agq-JuflqU3EgVxOVP1cYfyGgzWPrWHerb6gNEn_8OH-5n7mQsd4G9HGWgjEH9GlkUjMBO1sI9EAka5GQdQDhsGkV26UQ6KFpbdVR7GfiII/s1600/20130902-_DSC0931.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj1GRQKB43IAbOWo0T6n96vBdf8Z_lkID-_agq-JuflqU3EgVxOVP1cYfyGgzWPrWHerb6gNEn_8OH-5n7mQsd4G9HGWgjEH9GlkUjMBO1sI9EAka5GQdQDhsGkV26UQ6KFpbdVR7GfiII/s320/20130902-_DSC0931.jpg" height="211" width="320" /></a></div>
</td>
</tr>
</tbody></table>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj5tgJ90ycP9exEJkqCa9uQJ4OA2ViFZVVbutFI0gTXX7gDiKwxFmH_ttjrZJr_0_vIej-kATD_d87lcHzCw_noPCpDiuLs3Vlk_izh4B3q1KuKl2N7st9Ol2j7iz4VuyEp1JtGdKp7GD0/s1600/20130902-_DSC0932.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj5tgJ90ycP9exEJkqCa9uQJ4OA2ViFZVVbutFI0gTXX7gDiKwxFmH_ttjrZJr_0_vIej-kATD_d87lcHzCw_noPCpDiuLs3Vlk_izh4B3q1KuKl2N7st9Ol2j7iz4VuyEp1JtGdKp7GD0/s320/20130902-_DSC0932.jpg" height="317" width="320" /></a></div>Yannick Hold-Geoffroyhttp://www.blogger.com/profile/17227249978993145288noreply@blogger.com0tag:blogger.com,1999:blog-569950264826755092.post-32900947386766741102013-07-08T14:54:00.000-04:002014-05-06T17:28:56.794-04:00Project Euler Problem 1 - A journeyLet's solve efficiently the first <a href="http://projecteuler.net/problem=1" target="_blank">Euler Project problem</a>, which is:<br />
<blockquote class="tr_bq">
<div style="font-family: 'Trebuchet MS', sans-serif; font-size: 16px;">
If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.</div>
<div style="font-family: 'Trebuchet MS', sans-serif; font-size: 16px;">
Find the sum of all the multiples of 3 or 5 below 1000.</div>
</blockquote>
A naive implementation of this problem would be as such in Python:<br />
<div class="gistLoad" data-id="5337606" id="gist-5337606">
Loading ....</div>
This gives a <b>result of 233168.</b><br />
<br />
What we are really interested in, though, is the performance. Let's benchmark our result.<br />
<div class="gistLoad" data-id="5337619" id="gist-5337619">
Loading ....</div>
It runs <b>in 555.5 microseconds</b> using my computer.<br />
<br />
Now begins the fun. Two paths must be searched in order to attain maximal performance in this problem. The first is to enhance the mathematical search space, meaning that we should limit the number of numbers we evaluate. The first solution tried every positive natural number. It would be way more efficient to only run through multiples of 3 and 5. In Python, we can use the <a href="http://docs.python.org/3/library/functions.html#func-range" target="_blank"><b>range </b>iterator</a> (xrange for Python 2) to provide us only the interesting values. Concatenation of the two desired iterators can be done using the <a href="http://docs.python.org/3/library/itertools.html#itertools.chain" target="_blank"><b>chain</b> function</a>. It is then a simple matter of getting rid of duplicates, which we can do using the <a href="http://en.wikipedia.org/wiki/Set_(abstract_data_type)" target="_blank"><b>set</b> data structure</a>.<br />
<div class="gistLoad" data-id="5337633" id="gist-5337633">
Loading ....</div>
Giving us a <b>3.806 microseconds</b> run, always on the same computer.<br />
<br />
Ah. Much better. For a human being, this is the most optimal piece of core you can encounter solving your problem. It is short, relatively simple and quite easy to maintain.<br />
<br />
Everything further down in this post is an abomination and exists for the sole purpose of either educational or masochistic endeavours. Writing clear, simple and maintainable code is way more useful, profitable and developer-time efficient.<br />
<br />
Despite these warnings, let's dive into a C implementation of the naive interpretation of the problem.<br />
<div class="gistLoad" data-id="5337644" id="gist-5337644">
Loading ....</div>
10.75 microseconds without optimisations<br />
5.36 microseconds with -O3<br />
<br />
Quite more efficient than its Python counterpart, indeed. Let's write an iterator-like function giving out only interesting numbers. Instead of using high-level concepts such as the chain function or the set data structure which would be pretty complex to write in C, we'll simply use the fact that distance between multiples of two numbers repeats after their least common multiple (LCM). 3 and 5 have a LCM of 15, so let's enumerate multiple of both up to 15:<br />
<blockquote class="tr_bq">
3 5 6 9 10 12 15</blockquote>
This means that multiples of 3 and 5 will be a cycling loop of the difference between these numbers. This give us +3, +2, +1, +3, +1, +2, +3 (Beginning with 0, going to 3 is +3. 5 - 3 = +2, 6 - 5 = +1, and so on).<br />
We'll use the fact that switch cases are very efficient in C since they map to a jump case (multiple jump operations) in assembly language (compared to a bunch of <i>if</i>s which would map to comparisons operations). This gives us the following solution:<br />
<div class="gistLoad" data-id="5337666" id="gist-5337666">
Loading ....</div>
<div>
3.11 µs -O0</div>
<div>
1.55 µs -O3</div>
<div>
<br />
The C code seems to be on-par with the Python one without optimizations. But this is because of the timing mechanisms we used to benchmark both (one used in-code timestamps while the other one is using <i>time</i> which takes into account the process creation). This should not bias our benchmarks too much because of the repeating loop which minimizes the impact of the process creation overhead.<br />
<br />
This is the most performance you can get out of a portable and somewhat readable code. Anything further down in this post engulfs your senses in a warm delusional wool, or as someone stated it:<br />
<blockquote class="tr_bq">
<span style="background-color: white; color: #444444; font-family: Arial, 'Liberation Sans', 'DejaVu Sans', sans-serif; font-size: 13px; line-height: 17px;">If you put something like that in production code, you would likely be shot by the maintainer. A jury would never convict him. <a href="http://stackoverflow.com/questions/1732348/" target="_blank">[1]</a></span></blockquote>
I must insist: The rest of this is so bad that you will think that you are right doing so and have the illusory though that it makes you a master of a programming language or computer systems.<br />
<br />
Which is wrong.</div>
<div>
<br /></div>
For the foolish still willing to carry on:<br />
<br />
<div>
Skills in parallel development are <b>really</b> important nowadays. But let's put this in perspective: we've got a solution running at the microsecond scale. Spawning a new thread or process by the OS will generate a relative overhead way over any speedup could ever justify. Task-level parallelism won't help us for this particular problem.</div>
<div>
<br /></div>
<div>
But we can take advantage of parallel instructions available in current processors through the MMX/SSE/AVX extensions. Data-level parallelism to the rescue!</div>
<div>
<br /></div>
<div>
A generally clean approach to implement calls to CPU extensions in C is to use intrinsics.<br />
<br />
One way to solve our problem using SSE is to generate every 7 previously mentioned entries (from 0 to 15) of the cycle using a single addition operation. We then use the addition operation to perform a sum operation using a reduction pattern as shown in figure 1. It is only a matter of sum after this.<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgDjgoQEjZ7esowU5c31RtPZeEYy26rf2hwIapCUvhnuS6hiRa4GP2uuv__Qnlx5a6d2pICh-qSISRll9BzzXoPy58rzGzVMJ6fYSuOf71SYXQZ2o4qUd5Il2fMAxg3WwJqlMrj9IGpNxU/s1600/Horizontal+add+SSE2+(1).png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="287" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgDjgoQEjZ7esowU5c31RtPZeEYy26rf2hwIapCUvhnuS6hiRa4GP2uuv__Qnlx5a6d2pICh-qSISRll9BzzXoPy58rzGzVMJ6fYSuOf71SYXQZ2o4qUd5Il2fMAxg3WwJqlMrj9IGpNxU/s640/Horizontal+add+SSE2+(1).png" width="640" /></a></div>
<div style="text-align: center;">
Figure 1: Example integer sum reduction pattern using addition (SSE2)</div>
</div>
<div>
<br />
The cleanest way to have done it would be letting hints or rearranging the original code for the compiler to optimize our program using SIMD instructions himself, but I couldn't find a way to cleanly formulate my needs to get GCC to understand what I wanted. Written manually, it gives us the following code:<br />
<div class="gistLoad" data-id="5346143" id="gist-5346143">
Loading ....</div>
The codes run in <b>357 ns</b> on my computer using -O3 compiler flag.<br />
<br />
But we are still using a linear complexity algorithm... Isn't there something more efficient in the mathematical search space?<br />
<br />
Actually, yes. There is.<br />
<br />
There is an identity out there stating that sum of a multiple can be found directly. Using the inclusion-exclusion principle, we can find the answer for many multiples. But I'll let <a href="http://math.stackexchange.com/questions/9259/find-the-sum-of-all-the-multiples-of-3-or-5-below-1000" target="_blank">others explain that</a>. Let's see its implementation.<br />
<div class="gistLoad" data-id="5951200" id="gist-5951200">
Loading ....</div>
Sorry about the startup script skipping showoff. Comparing the execution time of this operation is irrelevant because the computation is done by the compiler and the answer is directly written to a registry as a literal constant.</div>
<br />
So there's two things to remember of this:<br />
<br />
Nothing is more important than knowing what we're doing;<br />
<br />
The journey is more important than the destination.<br />
<br />
<br />Yannick Hold-Geoffroyhttp://www.blogger.com/profile/17227249978993145288noreply@blogger.com0