September 15, 2013

Review of USB Condoms

We're in danger

Computer software anomalies often have their biological analogies such as viruses or bugs. Transmission of these virtual infectious diseases by physical contact is observed since the dawn of the digital age, be it on floppy disks, optical disks or USB drives. But recently, a new trend of threat transmission targeting mobile systems began to spread. Since the mobile era, more and more devices (cellphones, music players, etc.) use their USB connector for multiple purposes such as data transfer and charge. This allows malicious computer programs to inject malware on a device when its owner only wants to charge it. Worst: one can modify a simple charger to propagate these treats. The problem? It begins to spread to public and mainstream systems.

The solution to these virtual physically transmitted diseases? An USB Condom! Available through int3.cc, this simple board only connects the power pins through while cutting the data pins. Simple and clever. But can we presumptuously remove half the USB 2.0 wires without side effects?

The current state of affairs


The USB connectors are defined with 4 (2.0) or 10 (3.0) connections which always contain two power pins (5V and ground) in addition to their data pins.

Let's read some power specifications directly from the ultimate reference, the Kamasutra of USB, the official USB 2.0 standard documentation, pages 171 and 245:
Depending on the power capabilities of the port to which the device is attached, a USB device may be able to draw up to five unit loads from V BUS after configuration. A unit load is defined to be 100 mA.
It is also written on page 178 of the USB 2.0 specifications document that a device described as high-power should require a maximum of 500 mA on the 5 V power rail of the USB connector after being configured and 100 mA maximum during power-up (page 171).

The USB 3.1 specification rise the high-power current draw to a maximum of 900 mA (6 unit loads of 150 mA) when the client device is configured but states, on chapter 11.4 of its specifications:
Note that a USB 3.1 peripheral device shall not draw more than 100 mA until it detects far-end Rx terminations in the unconfigured state.
Finally, - the last quote from the specifications, I promise - the battery charging specification allows (in its revision 1.2 from December 2010) to draw up to a whooping 1500 mA from an USB port of a compliant device for charging purposes without the need to connect the data pins (after a delay of maximum 900 ms). But this amazing power is only available through hardware modifications of the original USB host specification, which most computers and laptops don't implement.

It should be noted that many manufacturers of portable devices and adapters do not follow these specifications. Most wall adapters can deliver more power than any standard specification, even the battery charging specification. For example, the ODROID-U2 uses an EIAJ-01-like connector to transmit 10 Watts! But who will condemn this practice - aside the firefighters - when it means faster charge times?

With this kind of devices, the USB Condom will do a good job protecting your beloved electronics from electronic STDs while allowing them to charge freely.

But you won't have this luck when charging from a computer, a laptop or any other power delivering device that is more likely to follow specifications. The USB enumeration process, which uses the data pins, will likely fail when using the USB Condom. The device providing power will not be able to read the configuration registers stating the high power requirement of the charging device. This will produce a maximum charge of 100 mA, which is the maximum allowed current during power-up (unconfigured state) and which coincidentally can be near or under the device power consumption. Result? An awfully slow charge or even no charge at all!

This behavior particularly hinders the utility of the USB Condom since its goal is to charge your portable devices while protecting it from malicious communications which are mainly from computers - which may have viruses - or a fraudulent charger containing an embedded microcontroller delivering malwares. Using the USB Condom will result in a slow or no charge from these sources, depending on how much the pirate manufacturer of the charger respected the specifications. It's a shame; when you put a condom, it should draw some satisfaction!

There shall be light

One way to circumvent this problem would be to include a USB charger detector IC or a simple microcontroller with an USB stack such as the ones available from Microchip. While keeping the power connectors linked to the protected device, its data pins would be replaced by the microcontroller's pins that would configure as a generic high-power device while ignoring any communication attempt. Sadly, this will boost the manufacturing price of the USB Condoms. But it will allow the charging sequence to respect the original USB specifications. This new device could even capture every malware injection attempt and create an Virtual STD Honeypot! What an appetizing idea!

September 2, 2013

How to lazily entertain a cat (Part 1)

Laser-powered Cat Entertainment System (LCES)

Cats love lasers.

So let's build a Laser-powered Cat Entertainment System (LCES). It's a fancy name for a laser that shakes autonomously. I know this has already been done a million times and is even manufactured and sold, but, hey! The fun is in the journey and not the destination!

This project is split in two parts. The first section will cover the physical system per se while the second one will discuss interactivity using a webcam.

For part 1, you will need:
  • A Laser;
  • Two servomotors;
  • A servo controller;

Laser

I used a cheap diode and resistor combo from DealExtreme.com, but any diode would do. Disassembling one you found at the flea market is another good alternative. While playing with it, remember that it is a diode, which means that they must be supplied in current. Applying a voltage source to it will cause it to burn. How can I create a current supply, you ask? For low power applications, using a resistor in series with the diode would kick in its $ V= R·I $, effectively limiting the current in the diode to $ \frac{V_{supply} - V_{diode}}{R} $, if you are interested by school stuff. That won't work on higher power application because the resistor's resistance varies with its temperature.

Be careful to use a low power laser because you don't want to burn your cat's retina!

Servomotors

For the servomotors, I've got my hands on two Futaba S3004, but any servomotor will do. You will want one with a disc horn large enough to support the second servo and the second one with arms, X or star-shaped. What are the differences? Check Figure 1.

Figure 1: Types of horns. Sorry, no arm because I didn't had one in hand. Just think of it as an X with two opposites branches cut off.

Servo controller

As for the servo controller, I will use a Pololu Micro Maestro 6, which can be seen in Figure 2.




The red wire is a bypass to power the servos using the 5V from the USB.

Using an already made servo controller is the simplest solution but you can always 

Hardware recap

Your mileage may vary on prices, be sure to browse a bit before purchasing to find better or cheaper parts.

Design

The assembly is quite simple:
  1. Put a servo perpendicularly on the other one;
  2. Fix the laser on the topmost servo;
  3. Plug everything together.
    1. Servos on the Controller;
    2. Laser and servos power to the USB power (available on the controller);
    3. Controller to the computer.
This is the look of the final design once you've plugged everything together.


Software

Now that the hardware parts are all ready, we can start coding. Here is a small example of randomly waving the laser around using Python.

You will need to install pyserial, first. In fact, I recommend you to create a virtualenv with Python 3.3, install setuptools and pip and then execute pip install pyserial. This works on Linux, Windows and Mac.
Make sure you've configured correctly your Pololu using the Maestro Control Center (provided with the driver). This means activating either the USB Dual Port or USB Chain mode, setting the channels properly (I recommend putting every non-used channel to Input for safety) and setting the Mini SSC offset and Timeout to zero. The other options are optional.

So here's the code:
Loading ....

And there you go! Next up on Part 2 we'll see how to hook up this setup to a webcam using OpenCV to analyze movement and create organic-like movement. Stay tuned!

I shamelessly offer you a couple of photos of my cat:

July 8, 2013

Project Euler Problem 1 - A journey

Let's solve efficiently the first Euler Project problem, which is:
If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.
Find the sum of all the multiples of 3 or 5 below 1000.
A naive implementation of this problem would be as such in Python:
Loading ....
This gives a result of 233168.

What we are really interested in, though, is the performance. Let's benchmark our result.
Loading ....
It runs in 555.5 microseconds using my computer.

Now begins the fun. Two paths must be searched in order to attain maximal performance in this problem. The first is to enhance the mathematical search space, meaning that we should limit the number of numbers we evaluate. The first solution tried every positive natural number. It would be way more efficient to only run through multiples of 3 and 5. In Python, we can use the range iterator (xrange for Python 2) to provide us only the interesting values. Concatenation of the two desired iterators can be done using the chain function. It is then a simple matter of getting rid of duplicates, which we can do using the set data structure.
Loading ....
Giving us a 3.806 microseconds run, always on the same computer.

Ah. Much better. For a human being, this is the most optimal piece of core you can encounter solving your problem. It is short, relatively simple and quite easy to maintain.

Everything further down in this post is an abomination and exists for the sole purpose of either educational or masochistic endeavours. Writing clear, simple and maintainable code is way more useful, profitable and developer-time efficient.

Despite these warnings, let's dive into a C implementation of the naive interpretation of the problem.
Loading ....
10.75 microseconds without optimisations
5.36 microseconds with -O3

Quite more efficient than its Python counterpart, indeed. Let's write an iterator-like function giving out only interesting numbers. Instead of using high-level concepts such as the chain function or the set data structure which would be pretty complex to write in C, we'll simply use the fact that distance between multiples of two numbers repeats after their least common multiple (LCM). 3 and 5 have a LCM of 15, so let's enumerate multiple of both up to 15:
3 5 6 9 10 12 15
This means that multiples of 3 and 5 will be a cycling loop of the difference between these numbers. This give us +3, +2, +1, +3, +1, +2, +3 (Beginning with 0, going to 3 is +3. 5 - 3 = +2, 6 - 5 = +1, and so on).
We'll use the fact that switch cases are very efficient in C since they map to a jump case (multiple jump operations) in assembly language (compared to a bunch of ifs which would map to comparisons operations). This gives us the following solution:
Loading ....
3.11 µs -O0
1.55 µs -O3

The C code seems to be on-par with the Python one without optimizations. But this is because of the timing mechanisms we used to benchmark both (one used in-code timestamps while the other one is using time which takes into account the process creation). This should not bias our benchmarks too much because of the repeating loop which minimizes the impact of the process creation overhead.

This is the most performance you can get out of a portable and somewhat readable code. Anything further down in this post engulfs your senses in a warm delusional wool, or as someone stated it:
If you put something like that in production code, you would likely be shot by the maintainer. A jury would never convict him. [1]
I must insist: The rest of this is so bad that you will think that you are right doing so and have the illusory though that it makes you a master of a programming language or computer systems.

Which is wrong.

For the foolish still willing to carry on:

Skills in parallel development are really important nowadays. But let's put this in perspective: we've got a solution running at the microsecond scale. Spawning a new thread or process by the OS will generate a relative overhead way over any speedup could ever justify. Task-level parallelism won't help us for this particular problem.

But we can take advantage of parallel instructions available in current processors through the MMX/SSE/AVX extensions. Data-level parallelism to the rescue!

A generally clean approach to implement calls to CPU extensions in C is to use intrinsics.

One way to solve our problem using SSE is to generate every 7 previously mentioned entries (from 0 to 15) of the cycle using a single addition operation. We then use the addition operation to perform a sum operation using a reduction pattern as shown in figure 1. It is only a matter of sum after this.
Figure 1: Example integer sum reduction pattern using addition (SSE2)

The cleanest way to have done it would be letting hints or rearranging the original code for the compiler to optimize our program using SIMD instructions himself, but I couldn't find a way to cleanly formulate my needs to get GCC to understand what I wanted. Written manually, it gives us the following code:
Loading ....
The codes run in 357 ns on my computer using -O3 compiler flag.

But we are still using a linear complexity algorithm... Isn't there something more efficient in the mathematical search space?

Actually, yes. There is.

There is an identity out there stating that sum of a multiple can be found directly. Using the inclusion-exclusion principle, we can find the answer for many multiples. But I'll let others explain that. Let's see its implementation.
Loading ....
Sorry about the startup script skipping showoff. Comparing the execution time of this operation is irrelevant because the computation is done by the compiler and the answer is directly written to a registry as a literal constant.

So there's two things to remember of this:

Nothing is more important than knowing what we're doing;

The journey is more important than the destination.