CUDA LOGICAL MODEL

NVIDIA’s CUDA programming language is a wonderful way of exploiting their graphics cards (Graphics Processing Units) for general purpose computing (GPGPU).  Effectively CUDA offers access to the cards via a relatively simple syntax extension to C (one of my favourite languages).

However I’ve found the descriptions of the mixture of grids, blocks, warps and threads that one must understand in order to program in CUDA to not be especially well defined anywhere.  I think some of this problem derives from a muddle between the logical and physical model.  CUDA offers a generic logical model : there is no real enforcement that the hardware will be a GPU – in fact, one can compile for a normal processor.  However there is one key constraint : groups of threads in a block are assumed to be able to communicate quickly internally, but blocks are expected to be independent (while yet processing identical code) and able to operate asynchronously relative to other blocks.

To help myself, I’ve put together the following diagram of what I understand the logical model to be.  I believe it is correct, but please post any corrections you find.

CUDA logical model

It is perhaps a little weak on the different types of memory available, being primarily focussed on thread execution, but it does highlight some points.

To properly program in CUDA you do also need to take account of the physical factors relating to resources attached to Streaming Multiprocessors, especially available shared memory, and the bandwidth-latency product.  However introducing these factors into the logical model tends (I believe) to confuse.   The logical model is therefore a first step.

Troubleshooting Bicycle Dynohub and LED light conflict.

I bought a Shimano bicycle Hub dynamo and a Raleigh RSP2 LED bicycle headlight. When working together they are very impressive, casting a beam that makes night cycling much easier with an impressive ability to see and be seen. However from the first there were problems, such as the light occasionally stopping working and the on/off switch never working at all. Eventually the system stopped entirely. As I had bought a second headlight, originally for another bicycle, I fitted that and sought to troubleshoot the first.

The light has no meaningful instructions, despite having two wires (of different colours) and two (unmarked) terminals. This had made the initial installation difficult, especially as I also wired a rear light that did have polarity marked on its terminals. The rear light turned out not to be the problem.

To put the conclusion first : the Shimano hub connects one side of its AC output to the axle and hence the bicycle frame. The RSP light connects one side of its DC circuit to its mounting bracket and hence (usually) the frame. It is almost certain that any installation of these components will conflict. Depending on the specific wiring the light may however appear fine, although it flickers more at slow speed than necessary. It will however be being operated without current limiting and presumably be likely to fail prematurely. More details are below, but it is disappointing that this problem is not well publicised on bicycle fora.

The light seems to be a sealed unit and eventually I had to destroy it to open it and inspect the circuit. Investigations before I opened it gave no real clue as to the specific wires/terminals : in fact, all seemed to be shorted together according to my multimeter.

On disassembly the positions became clear. The external wires and the terminals are directly connected in pairs (specifically the black/white wire connects to the left-hand terminal when viewed from the rear, and the white wire to the other). Internally the wires are connected by a Transient Voltage Suppressor (P6KE9.1CA) i.e. it clips the AC to a nominal 9.1V peak value. This seemed to be faulty, conducting at all voltages, hence creating the apparent short between terminals seen by my multimeter. I suspect this to be the fault that finally caused my lamp to fail, but not the primary problem as discussed below.

The black/white wire is only distinguished from the white wire in that the latter is internally connected to the on/off switch. This is consistent with the lack of exterior markings on the terminals, although somewhat inconsistent with using different colour wires at all. It should not really matter which wire is which since they are connected as the AC inputs to a bridge rectifier. However it does actually matter in the fault condition described below, because the switch is attached to one particular wire.

The DC side of the circuit consists of two resistors in parallel, the pair then in series with the LED. In principle this is a simple circuit, typical for a LED, of a current limiting resistor on a DC circuit, with the bridge rectifier converting the AC to DC and letting the LED illuminate on both the positive and negative AC cycles. However the positive side of the LED (in fact the whole metal plate on which it is mounted) is in contact with an aluminium spar that forms the backbone of the light and presents a threaded hole for attachment to the bracket and hence the bicycle. This is likely to lead to electrical contact with the bicycle frame, especially as a metal bracket is supplied – assembled and connected to this spar. This spar may well also form a heatsink as it is in electrical and thermal contact with the LED backplate and is a piece of costly aluminium used where plastic would have been structurally adequate, so removing it may be unwise.

The electrical contact to the frame is the problem. Unlike some hub dynamos, Shimano hub dynamos are also electrically connected to the bicycle frame and for this light this shorts one AC input with the DC positive line. This is an especial problem as it circumvents the current limiting resistors used with the LED. There are now effectively two cases depending on whether it is the white or the black/white wire that is connected to the frame (recall there are no instructions). If the white wire is connected to the frame, then the on/off switch will work as expected and the light will appear to work fine. However what is actually happening is that the LED is only active on half the cycles and is working without the current limiting resistor. On the other half cycles, current still flows through a pair of diodes but the LED is out of the circuit. In the other case where the black/white wire is the one connected to the frame, the situation will be the same except that the on/off switch will not work (the light will always be connected somehow). This latter case matches exactly the symptoms I saw, although I did not realise the low speed flickering was worse than the design intent (given an AC supply, with no smoothing, some flicker is inevitable).

Solution

As the light’s case does not appear to be designed to open without destroying the headlight, an external solution is preferable. For attaching the light, the spar has a threaded M4 hole in a block with a local diameter around 8mm, i.e. could be drilled out to 5mm or 6mm diameter and an insulating sleeve placed around the original M4 bolt. A simpler technique without risking voiding the warranty is to use an insulating bolt (e.g. nylon) in the original hole (M4 at 25mm long will suit). Both of these techniques should be possible without opening the case – I have only tried the latter. The net effect is to insulate the light’s DC circuit from the bicycle frame and hence avoid a short to the AC circuit.

Experiments

To probe whether the above analysis was correct, I used a MCP3008 Analogue to Digital Converter connected to a Raspberry Pi Nano to achieve around a 1 kHz sampling rate of both voltage and current, the latter using a ACS712-based sensor to convert current to voltage. The results are as follows, and confirm the above analysis as far as I can tell :results

Wider applicability

I have only tested one light from one manufacturer, and cannot tell if other lights are affected. However the fact that the symptoms were not clear cut could mean the problem is widespread yet unreported and many people are operating lights with greater flickering, without current limitation and with the unknown risk of premature failure.

Summary

When using a hub dynamo that is not electrically insulated from the bicycle frame, the RSP2 light I bought needs to be installed carefully (with changes to its supplied components) to avoid appearing to work while actually operating out of specification, adding extra drag and perhaps being damaged. The lack of any manufacturer instructions (let alone ones covering this issue) make it highly likely the light will be installed wrongly. The problem might be widespread. This cost me at least one LED headlight. I hope the article helps you avoid that.

Car Battery Tester on Steroids

Draper make a car battery tester that is a resistive element, designed to draw 100A from a 12V car battery; a toggle switch; and an analog voltmeter. This is all housed in a case and has clamps to connect to the battery. The analog voltmeter is read by eye after the load is applied for 15s. Ultimately this is crying out to be hacked into something more interesting. That ‘something’ is now an automated system that monitors both current and voltage over time and can export results.

Battery Tester.png

My hacked device is based around an Atmel AT Mega 8 microcontroller. The device is designed to firstly draw power from the 12V car battery under test, and on command from a trigger switch, record a test within its EEPROM. When disconnected from the car battery and connected to a USB host, it senses that fact (via voltage levels) and powers itself instead from the USB bus and (by pretending to be a USB keyboard) now writes the EEPROM values back to that USB host as tab-separated number columns. This is after an initial preamble describing the system, the test conditions and the firmware version. This can be used to populate a spreadsheet, and the results graphed. This is an example of the results from a brand new battery, showing the initial voltage, the drop under load, the recovery over time, and (in red) the current load :

Test example

The data is produced by two inputs to the microcontroller’s ADC pins. Firstly a resister potential divider is used to map the ~12V battery voltage to something below the 5V regulated supply to the microcontroller (High resistance values should be used to limit current). Hence a voltage trace is obtained. Secondly a solid state current sensor (an ACS755LCB-130-PFF, Hall Effect High Current Sensor) is inserted into the current path. It produces a voltage proportional to current, with the voltage read by another ADC pin on the microcontroller.

Hardware Design

The tester I obtained second-hand from eBay had a defective toggle switch, and the obvious replacement to allow automated operation was a car starter solenoid since these are effectively relays designed for this very purpose. I obtained one second-hand on eBay, and (when driven by an intermediate FET) this was happily controlled by the microcontroller. This was a few years ago, and it now looks as though 100A solid state relays can be obtained for similar prices : this might be a better idea, as it would also avoid a safety problem that I had to design around (below). That said, the solenoid provides a nice audio indication of the test starting and stopping.

The device operates well within the capabilities of the AT Mega 8. There are outputs to four LEDs (some bicolour) for status information to the user. Inputs are either ADCs; or to sense the microswitch or the solenoid status. A socket to the ISP pins are provided to allow firmware update.

As noted above, the device emulates a USB keyboard and this is achieved using the V-USB Driver, with structures/ideas from Frank Zhao’s USB business card.

I had thought this would be simpler than the Ethernet connection I used on, for example, my clamp meter conversion, but other than having the advantage that USB provides power, it was just as complex. I had been thinking also that USB was more universal and does not require bespoke software on the reading machine (especially as I might need to loan the device to people), however now that I can implement a (rudimentary) HTTP server on the microcontroller it would have been far simpler to access the data via a web server and I would do this if I undertook the project again. Alternatively writing direct to a flash card is an option, and this would both obviate the need for a separate communication system and solve the problem of limited EEPROM storage (below).

In terms of space, the starter solenoid, FET and current sensor fit within the original envelope of the tester. The lower power components, although they could possibly have fitted as well, are kept away from the heat and large currents by mounting in an old floppy drive metal case, which happens to have the same width as the tester and is riveted to stand proud above an air-gap atop the case. The whole is robust enough to live in the garage.

Software Design

The microcontroler is programmed in C, and primarily operates as a finite state machine moving from pre-test to test etc on command or timer. Overlaying that is a regular timed interrupt (10Hz) to service the LEDs etc. Some of the LED behaviour (e.g. mark-space flicker) is designed to minimise the power drain on the USB (theoretically only 100mA is available and, while the microcontroller uses very little, four LEDs can eat into that)

The primary design constraint is the size of the EEPROM into which the data is stored (512 bytes). Although the AT Mega 8 has 2kB of SRAM, the operation mode for this device has it powered down between connection with the battery under test and re-energised (perhaps days or hours later) to connect to a computer. There are several trade-offs possible – the sample rate for instance varies for the pre-test, test, and recovery phases deliberately to balance maximum information at interesting times with sample period length. Only 8 bit data is stored, and the voltage is scaled before storage to maximise resolution, based on an assumption of allowable values. The firmware had to be changed once I discovered it was not safe to assume that a new battery would not exceed 12.5V unloaded.

During the actual test period, the device measures current as well as voltage. As there are spare ADC lines, I also recorded the voltage across the heating element during this period (i.e. excluding any voltage drop across the solenoid) at the expense of the need for some more storage, but this turned out to not differ significantly from the full battery voltage.

Safety

There are several safety issues, in addition to those already inherent in the device and working around car batteries. Notably, as a 1kW heater the device could burn people or potentially start fires. However adding an element of software control brings new problems.

I designed the main safety feature as follows. The test is initiated by closing and holding a microswitch. The microcontroller will terminate the load test after 15s, but for safety it must also terminate if the microswitch is released sooner. The important safety element of the design was to ensure that these were independent systems : it would be no use releasing the switch if this merely told the microcontroller to stop as this would leave the safety problem in software and not create an independent system. However it was also necessary for the initial press on the switch to tell the microcontroller to start; and for release to be sensed by the microcontroller so that it would not reapply the current if the switch reconnected. The solution was to tie the FET gate to 12V via a 10k resistor. The microswitch then connects the gate to a lower voltage derived from a pin on the microcontroller and potentially overpowers the first resistor, but with the microswitch open, the microcontroller gets no vote. Initially this pin on the microcontroller is set to be an input, without an internal pull up. Hence it senses when the microswitch operates, but the FET is not activated. The microcontroller switches the pin to be an output, but leaves it high (5V) which still does not activate the FET. The recording cycle starts, and at the appropriate point the pin is driven low, activating the FET. The solenoid has a convenient 12V output when activated, allowing a LED to indicate the test is underway, and a separate input on the microcontroller to sense this fact. Since releasing the microswitch will unilaterally close the FET, this second sense input allows the microcontroller to cancel the cycle (preventing restart) in this abort condition. Altogether this should ensure the device can always be stopped safely without ever requiring the clamps to be released under load with the associated spark hazard.

A significant safety issue is that, while the original metal box is insulated from both battery poles, the use of a metal starter solenoid whose case was connected to its earth made it necessary to earth the tester’s metal box (as the solenoid was too large to be isolated away from the box, indeed it required some metal cutting to fit). This increases the risk that the positive pole might short to earth; but also has the particular problem that if used to test a battery in a car with a positive earth (unusual, but not unknown) there would be a great risk that the tester case would touch bodywork and short. The solution was simply to only connect the case to earth via a fuse of high enough value to allow the solenoid to operate, but low enough to blow otherwise. 10A was suitable.

Possibly unnecessarily, but I protected every microcontroller input with a tranzorb spike suppressor of 4.7V. They are cheap, and it was not clear how switching 100A in close proximity would affect the controller. A 15V transorb protects the whole device (in parallel across the 12V input), as in series does a 1A blade fuse and a diode to protect against reversed connections. Despite all this, I believe it is wise to enforce a prohibition about connecting the device to a computer while the test is on going, relying instead on storage.

Operation

The functionality of the machine is probably fairly clear from the instructions now printed on the faceplate (below) and the graph of outputs.

Faceplate.png

Cisolate : PCB construction

I have the greatest respect for the Visolate program, currently available at GitHub. It produces Voronoi Isolations of PCB traces enabling PCBs to be produced on a CNC mill. The milled paths are centralised in the available space thereby maximising current carrying capacity and allowing a single milling pass to separate two traces.   However Visolate was initially a research project and unsupported for a while. As that time, I started to develop an alternative, based on a Cellular Automata approach, which also met my needs by performing some optimisation of the cutting paths (in order to reduce milling time) and producing a (likewise optimised) list of drill points as a G-code file. As this program is based on a Cellular Automaton, rather than a Voronoi approach, I have always called it Cisolate (although actually it may just be a different way of producing Voronoi diagrams). Cisolate is also available on GitHub. Here is an example of some of Cisolate’s output for the test PCB used by Visolate:

Results.png

On the left is the board with the isolation paths identified in green. In the middle is the automatically identified set of drill positions and an approximation to the optimum order in which to drill them. On the right is again the isolation paths, but in red are the unnecessary transits (where the mill lifts off the board and moves to a new location). The quality of the optimisation is indicated by the limited number of red paths.

On this particular example, all processing was complete in 10 seconds on a not especially recent machine, although note that the code operates in parallel using multiple threads (configurable) and can therefore use all available processors. Significantly more complex boards are still analysed in minutes, with the majority of the work being in the optimisations.

While having no wish to criticise Visolate, I think Cisolate offers the following advantages

1. Cisolate uses native Java libraries, so there is no need to install anything beyond the JRE (This was a design intent)

2. Cisolate operates purely from an image, so can be used in various situations where a Gerber file is not available, including potentially repair after scanning a PCB image. This is paired with the weakness that it cannot directly use Gerber files as input, but other programs will convert Gerber files to images.

3. Cisolate undertakes optimisation of the G-code so that it produces far fewer transits (for reduced manufacture time and less wear-and-tear) and shorter code.

4. Cisolate automatically produces a G-Code Drill file (as well as a G-Code mill file and/or an etch image). Likewise there is a disadvantage that a PCB image must include the drill points : not having drill points does not just mean there is no drill file, it means traces are not anchored and are seen as dead ends that the process removes.

There are two other interesting outputs from Cisolate as shown below:

Others.png

On the left is the automatic determination, based on the Cellular Automata output, of which points are three-way intersections (blue) or four-way intersections (magenta) in the milling path. This enables the extraction of individual line segments that can then be combined in an optimum order. I was especially impressed with the CA, which produces highly robust results. The lack of any need to process special cases was notable making the coding much easier. On the right is a ‘heat map’ which indicates the generation of the CA by which a given point had been determined. The benefit of this heat map is that it allows the later path optimiser to have an idea of the freedom it has to smooth or move a path. Black indicates original copper and may not be encroached upon. The more white the area, the greater freedom there is to change (with white itself being the original ‘watershed’). White dots are drill points and cannot move as they are surrounded by copper.

Methodology

The overall process is for a Cellular Automata to thin the areas between the copper tracks to their median positions. The advantage of the CA is that its output is kingwise-connected pixels such that the nodes of the diagram are clearly identifiable via certain pixel patterns. One-way intersections are points – ipso facto drill locations. Three- and four-way intersections are also robustly determined (diagram above). Given the identification of the nodes, this makes it easy and unambiguous to extract the individual lines between nodes as individual paths to mill. There are now three optimisations. The order in which the drill points should be drilled is a classic travelling salesman optimisation. The mill traces are similar – but note they are lines not points, and only the transits between the start and end of the lines are optimised at this point. There may also be a need to reverse the order in which a line is traversed. Finally the milling lines themselves are optimised (smoothed) by attempting to fit circular arcs to them subject to the heat map information. This minimises the number of individual G-code lines, within certain constraints. It leads to huge reductions in file size and probably allows much better look ahead optimisation by the ultimate cutter, at the expense of slightly worse fit to the exact solution

The core engine implements a cellular automata method, derived from that due to Guo and Hall (Parallel Thinning with Two Sub-iteration Algorithms, Communications of the ACM 32(1989) 359-373,759) also described in Knuth TAoCP 4A (7.1.3) for thinning an image. I had to add some further cells (shown below) to remove dead-ends (you do not want to waste time milling an electrical isolation path that doesn’t reach another path as it won’t isolate anyway). Note however that this is the reason that drill points must be marked on the image (see above). The lack of a drill point makes a pad look like a needless salient that can be removed, whereas the drill points anchor it.

Extra cells

The optimisation currently uses simulated annealing and seems to work well, although SA always has more tuneable parameters than I really like.

In the absence of a standard pattern, Cisolate uses the following design pattern for the Travelling Salesman optimisations, which separates the Optimiser (currently Simulated Annealing) from the Seeker (that records the current result and provides alternative perturbations to try) and provides an Optimee function as a metric to optimise against. The whole is a Runnable, as optimisation is a slow process so the ability to pass it to parallel threads is desirable.

Optimisation Pattern

This pattern is not used for the G-code smoothing optimisation as that just conducts an exhaustive search based on the fact that G-code only allows straight lines or circular arcs. Hence these are the key primitives. Arbitrarily, allowable arcs are constrained to a fixed set to ease computation. Again, arbitrarily, the radii of this set follow a geometric progression. The largest radius is set s.t. w.l.g. it may be considered a straight line. Hence, during the computation, only ‘circular’ arcs are considered, some of which are later assigned as straight lines when the G-Code is written.

There is a ‘de minimus’ concept in the optimisation to decide whether it is worth worrying about differences, which allows a reduction in the number of different sequences on this basis.

The process is to first determine fixed control points and then, subject to avoiding copper traces, conduct a greedy optimisation to find the best (closest fit to original curve) arc through each pair of consecutive control points. Then again greedily, join adjacent control points that have identical radii between them. The process continues by merge further adjacent curves if a compromise radius does not violate a de minimus setting and then jitters control points to facilitate further merger.

This screenshot of the program shows the effect of optimisation on the G-code after the curves have been straightened (note especially the situation near the top right). The individual tabs (Board, Isolations, Drill, Drill Path …) show different views of the situation (as used for illustrations throughout this blog) and they are all linked together, so that pan and zoom affects them all at once and the views can be tabbed between.

Sceeenshot

Finally, for a more complex board, this is this board from an individual’s projects on the web to prove that Cisolate can just work with an image directly.  After a small amount of tidying (there is no point in ‘isolating’ text or keeping orphan pieces of copper). After processing for fewer than 5 minutes, these are the results (you may need to zoom in):

drillPaths

milling.jpg

Status

Having tested on about twenty boards, I am confident that the technical processing is correct and that a GUI and command line version are working so that I can use them. At the moment however the system is not user friendly – for instance only 24-bit bitmaps are accepted and some parameters (etch width, drill plunge depth) are hardcoded.

IoT Clamp Meter

Maplins sell a cheap clamp meter that fits neatly in the palm of your hand. However a much more useful device would be a network connected meter that could read both current and waveform information, therefore being a ‘smart meter’ that is part of the Internet of Things. One application would be to record a time trace of current for a given domestic appliance, for example a dishwasher through its operating cycle, aiding diagnostics for repair, especially if a previous trace exists of the machine working correctly.

Hardware Design

The block diagram of the adapted system is as follows:

Block diagram

This is produced by removing the existing circuit board from the clamp meter (only retaining the clamp sensor itself and the case) and filling the remainder of the case with a micro-controller and an Ethernet board. There were very few steps in the physical circuit, with most functions now in software. The physical circuit used an Op Amp to amplify the initial signal; the micro-controller (an Atmel AT Mega 328P) recording the voltage with one of its integral ADC lines; and an Ethernet interface (based on an ENC28J60) connecting the machine to the outside world. It no longer has a local display, although an added LED returns some status information. The In-system Programmer pins are available via a socket, allowing easy firmware changes. The link to the Ethernet is also via the same SPI interface used by the programmer.

These are some before and after pictures:

Before and After.png

Safety point. I believe the pick-up coil is a current transformer and this does mean some counter intuitive points for those used to more usual voltage supplies. For a voltage transformer, one expects to increase a resistance to decrease current and so limit things to safe values. A current transformer will instead endeavour to drive a constant current, and hence connecting it across a higher resistance (and an air gap can qualify as a high resistance) will generate more heat, not less (see the Safety section of the Wikipedia article)

In particular, using a current transformer from an older, non-digital meter seems likely to have a lower turn ratio and hence more dangerous secondary. You should be familiar with these effects before implementing something.

The initial pick up involved passing the transformed current through a resistor and using an Op Amp to sense the voltage drop across that resistor. The Op Amp was chosen to be a single-, rather than split- power device (in this case a LM358N) to allow the shared use of a 5V DC supply to power it and the micro-controller. The bias and gain on the Op Amp were set (via resistors) to achieve a clean waveform with nearly full scale deflection (i.e. not quite clipping at either end) for an input of around 80 Amps to the sensor. This is a credible figure to cover the range over which the main supply to a house might vary. Using a clamp meter on an individual appliance is not normally possible, because the current enters and returns in a single lead carrying both Live and Neutral. This is actually an opportunity however, because by implementing a short extension lead whose Live and Neutral are passed in multiple loops in opposite directions through a short loop of conduit which the clamp meter can encircle, one effectively makes an amplifier.

Adapter.png

Three wires in each direction creates a x6 amplifier, which means that a full scale deflection of 80A corresponds to about the 13A of a standard switched appliance, allowing more accurate measurements of individual appliances. Further this design means the meter need not itself physically switch scales, which is advantageous in avoiding an air gap for the current transformer (see the Safety point above).

The microcontroller needs very little ancillary hardware; mainly just capacitors, to stabilise the power supply and reset pins, and the status LED. The ENC28J60 provides a 25MHz clock, so the microcontroller does not need its own crystal circuit, but does divide that clock by two to operate within specification at 12.5MHz. Most of the microcontroller pins are unused. Given that there are several unused ADC pins, an obvious enhancement would be to provide mains-synchronised low-voltage AC power to the device, which it could rectify for its own power, but sense via a spare ADC channel to derive phase information to compare with the current traces.

Since the ENC28J60 was a 3.3V device, a voltage level shift was required between it and the micro-controller. In hindsight it would have been better to design it to use 3.3V throughout.

A particular problem with the specific ENC28J60 module I used was that it appeared to be mis-wired on the Ethernet side. Some Googling (now a dead link) suggested this was not an unknown problem, and in my case it was solved by producing a mis-wired Ethernet cable to compensate. Specifically I swapped wire 1 with 2 and wire 3 with 6 (a normal cross-over cable switches wires 1 with 3 and 2 with 6). Although modern self-sensing switches will correct such things, the direct connection to a PC (see below) may require it to be right : but other ENC28J60 modules may be correct anyway.

Software

The microcontroller is programmed in C. I had previously produced a full stack of TCP/IP/Ethernet code (on which I will blog later) for the AT Mega supporting various protocols. However only ARP, ICMP, DHCP and UDP are needed in this application. Although I had written a driver for a legacy ISA card for the data link layer (based on ideas from David Clausen), an ISA card was far too physically large for this device and I had to use the ENC28J60 for which I used Guido Socher’s driver (from Tuxgraphics). I would note that Tuxgraphics and David Clausen’s work were the primary inspiration for this device.

The adapted clamp meter uses a locally administered MAC address (as the ENC28J60 card does not come with a global one assigned) although any MAC address could be set in the software. It will seek a DHCP server over the Ethernet link and accept whatever IPv4 address it is given. If it does not find a DHCP sever, it can be setup to also use a defined, static, IP address on the LAN or to assign itself a known IP address in the APIPA range (169.254.x.y), which is the same behaviour as, for example, a Windows machine that cannot find a DHCP server. On this basis it can be used directly connected to a PC or laptop without a LAN, or can be used across a LAN supported by DHCP. To assist setup, it will respond to a ICMP PING (and, less human-friendly, an ARP packet) once it has an IP address.

The work specific to the meter application is relatively trivial. A bespoke UDP packet (i.e. with appropriate magic number) directed at the device will cause it to respond with a packet containing a set of samples from the ADC spanning one 50Hz cycle and a summed current value. Only one return packet is required as there are around 200, 16-bit samples. For memory related reasons the code limits packets to 536 byte payloads which, when combined with a 20 byte UDP header and a 20 byte IP header corresponds to the 576 minimum datagram size required for IPv4 compliance (RFC 791) without fragmentation. UDP packets can be transmitted in quick succession (<1s spacing) to obtain a real-time trace of current consumption.

Although it would likely be well within the capability of the AT Mega, any further processing can take place at the receiver end where far more compute power is likely to be available. For future enhancement, the packet specifies which ADC pin (from the choice of 0 to 5) the microcontroller should read, enabling perhaps voltage-phase information to be returned.

Since the clamp meter uses a 12.5MHz clock and the AT Mega can achieve its maximum 10 bit ADC resolution only at sampling rates below 200kHz, the sampling rate needs to be reduced and the AT Mega prescaler is used at its maximum setting (which allows the full voltage resolution) as well as taking groups of three samples but only recording one from each group. On this basis, 188 samples were found to correspond to a full 50Hz cycle.

Security

In principle, using items from the Internet of Things on your home network is the IT equivalent of allowing people into your house to rummage unsupervised. However the benefit of producing your own IoT devices is that you have tight control of their behaviour (the device will silently drop all packets that are not ARP; PING; expected DHCP responses; other UDP packets without the magic number or with malformed checksums etc). The code is also relatively short and easy to check. Importantly, and unlike many other devices such as home routers, the device cannot be re-programmed through the Ethernet interface. Instead it is flashed through the out-of-band SPI system (although the ENC28J60 is attached to SPI, it does not control the necessary reset line to initiate programming).

The microcontroller with relatively limited computing power will have no significant resistance to a DoS attack, and anyone who knows the magic number can determine your power consumption trace, but these are hardly material threats behind a firewall and possibly not even so if the device is directly connected to the Internet.

Results

Monitoring the main feed to the house shows both waveforms and current draw very clearly The main graph shows current draw over time, and the insets show the current flow over the AC cycles.

Trace 1

Initially there is low current background (probably mainly CFL lights), with a heavily distorted current spectrum. Although the cycle is clear, it is patently not sinusoidal. Adding significantly larger and mainly resistive loads of heating elements (kettle, dishwasher and (tungsten filament) security light) draws notably more current (enough that their individual switch on times can be seen) and this now approximates a sinusoid.

A second set of results show the pattern of the dishwasher heating element over time and the sinusoidal addition of a kettle. But at one point the microwave is used alone (other than background) and has a very distinctive current draw of its own.

Trace 2

Overall this should be a useful diagnostic tool.

Breaking Handycipher 2

I presented the solution to the decryption challenges in the first version of Handycipher, published at version 1.3, in this post.  That solution relied on identifying the nulls deterministically, by exploiting the fact that they could not abut one another.  The author has addressed this vulnerability in his version 2.1 posting.  Essentially v2.1 uses short groups of nulls as ‘escape sequences’ after which he inserts non-colinear groups.  This is a good method of frustrating the hill climbing in my first attack, because the climber would not be able to distinguish real correlations (which it exploits) from inverse correlations.  However all this is predicated on not being able to identify the nulls.  If they can be identified, then the solver actually has available the extra information of the inverse correlations.

Although I could no longer determine the nulls deterministically, I was able to determine them stochastically.  On the (assumed) basis that they were all equally probable, I checked for uniformity of the frequencies of all the presumed null groups (there are only 142,506 possibilities).  I also checked for the lengths of the ‘null prefixes’ generated, expecting them to be uniformly distributed in range 1 to 5 – this was actually the more powerful test (I used Kolmogorov–Smirnov for both).  This fairly quickly (after setting a threshold of significance) returns one clearly separated set of nulls for each challenge message.  As before, I ran it with a sliding window to extract the (rough) location of the master key.

I now needed to temporarily remove five characters to be safe (maximum is six, but the last two are co-linear) after each null I had found, to remove the ‘distraction’ letters, and the resultant text had the properties of the v1.3 cipher and could be solved using the code I produced before.  Once a likely key is determined, the message can be tidied by running an actual v2.1 decryption algorithm against it.

There’s another interesting wrinkle : I said I assumed linearity in the lengths of null prefixes.  In fact, they are not linear, clearly having a maximum around length 2 and tailing off by length 5.  Personal correspondence with the author shows that he is randomly using the 31 five-bit patterns (excluding 0000) to select the nulls.  This would indeed produce the binomial-like distribution of lengths seen.  Desite using the wrong filter (seeking a uniform, not a binomial) the divergence from all other character sets was clear enough to detect the nulls.

All of this is facilitated by the author being generous enough to use long (500-600) character plaintexts.  However it is better to know whether a cipher is really secure, rather than just good enough for a short message.

For the record, the keys (space represented by ‘^’) and messages are :

“YUTGRE^KNS-CF?ZLOMVDPBHA.JI,WXQ”

“IT HAUNTS ME, THE PASSAGE OF TIME. I THINK TIME IS A MERCILESS THING. I THINK LIFE IS A PROCESS OF BURNING ONESELF OUT AND TIME IS THE FIRE THAT BURNS YOU. BUT I THINK THE SPIRIT OF MAN IS A GOOD ADVERSARY.  — TENNESSEE WILLIAMS THE PROVERB SAYS —  BORN LUCKY, ALWAYS LUCKY — AND I AM VERY SUPERSTITIOUS. AS A SMALL BOY I WAS NOTORIOUSLY LUCKY. IT WAS USUAL FOR ONE OR TWO OF OUR LADS, PER ANNUM, TO GET DROWNED IN THE MISSISSIPPI OR IN BEAR CREEK, BUT I WAS PULLED OUT IN A TWO-THIRDS DROWNED CONITION NINE TIMES BEFORE I LEARNED TO SWIM, AND WAS CONSIDERED TO BE A CAT IN DISGUISE. — MARK TWAIN”

“XMLUQ?,IJ.^PKVTBNSEZDYWHC-ROAGF”

“MODERNISM AS CUMMINGS AND HIS MID-TWENTIETH-CENTURY COLLEAGUES EMBRACED IT HAD THREE PARTS. THE FIRST WAS THE METHOD OF USING SOUNDS INSTEAD OF MEANINGS TO CONNECT WORDS TO THE READERS FEELINGS. THE SECOND WAS THE IDEA OF STRIPPING AWAY ALL UNNECESSARY THINGS TO BRING ATTENTION TO FORM AND STRUCTURE — THE FORMERLY HIDDEN SKELETON OF A WORK WOULD NOW BE EXUBERANTLY VISIBLE. THE THIRD FACET OF MODERNISM WAS AN EMBRACE OF ADVERSITY. IN A WORLD SEDUCED BY EASY UNDERSTANDING, THE MODERNISTS BELIEVED THAT DIFFICULTY ENHANCED THE PLEASURES OF READING.   —  SUSAN CHEEVER, THE PRINCE OF PATCHIN PLACE”

These are different messages from the v1.3 paper, although the first is again prefaced by the same known plaintext as discussed in the paper.

Encrypting to known text with Tunny

There are several references on the web to the Bletchley cryptanalysts managing to set up a Tunny (Lorenz SZ40) cipher machine to encrypt the standard test phrase : “NOW IS THE TIME FOR ALL GOOD MEN TO COME TO THE AID OF THE PARTY” into the start of the Wordsworth ode “I WANDERED LONELY AS A CLOUD THAT FLOATS ON HIGH OER VALES AND H”, thereby surprising a visitor during a demonstration.  To my knowledge, nowhere on the web is this feat explained, and this post attempts to rectify the situation by reverse engineering the problem.  It’s a fascinating little exercise and worth trying yourself. At face value the ability to do this transformation at all seems surprising.  Tunny uses standard teleprinter characters, i.e. an alphabet with 32 characters (5 holes/bits), and XORs the plaintext with a keystream to produce a given piece of ciphertext (i.e. uses the original Vernam cipher, as envisaged for a teleprinter).  For text of length n, the relevant keystream to achieve the feat will be one of the 32^n that exist. Since there are approximately 1.6×10^19 start positions (also the cycle length) for the 12 wheels, it would seem unlikely that the correct stream could possibly exist for texts of length greater than about 13 characters, never mind the chance of actually finding the right one.  However this calculation is of the correct form for the feat using the fixed wheels of Enigma, whereas Tunny had configurable wheels, each allowing a character by character configuration of the bit to use in the XOR (i.e. whether it is 0 or 1) via cams on the wheels, one per wheel position. Although this configuration would now make matters easy if the wheels were infinitely large, the smallest wheel in Tunny repeated after 23 steps (with one step per character encoded) and even the largest encoding wheel repeats after 59 steps.  The sample text above is 64 characters.  The problem looks tricky. At this point the basic structure of Tunny needs to be explained.  This explanation is derived from the book Colossus by B Jack Copeland and others.  Tunny uses twelve wheels, each with a number of steps per rotation co-prime to all other wheels, allowing a long period before repetition of the wheel state.  There is one of the cams referred to above for every wheel step.  The twelve wheels comprise two groups of five wheels and a control (‘motor’) pair.  Any given character of the 32 available is represented by five bits and encrypted by a further five keystream bits.  The five keystream bits are based on the XOR of the two groups of five wheels.  For the next character, the first group of five wheels (known as Chi wheels) would move on one position.  The second group (Psi wheels) might move on one position, dependent on the pattern from the control pair, but if they do move, they all move together. The two control (Mu) wheels had periods of 61 and 37, with the first always moving, and the second moving if the first had the cam set at the current position.  The Psi wheels move if the second control wheel has its relevant cam set. A diagram of the process is as follows: Tunny

Clearly the XOR between the Chi and Psi wheels means they are individually paired, with each pair affecting a given bit of the five in the teleprinter tape.  The wheel with the shortest period (23) referred to above is a Chi wheel, and is paired with (i.e. affects the same bit as) the longest period Psi wheel, of period 59.  This pattern is standard : the Chi and Psi wheels are paired in reverse order of length.  Specifically the (Chi, Psi) pairs are (23,59), (26,53), (29,51), (31,47) and (41,43). As noted, the Chi moves on every occasion so, while we may freely set all the cams on the Chis (and hence encode from arbitrary text to any other arbitrary text) for the first 23 characters, after that the first Chi starts to repeat; and the next starts to repeat three characters after that.  Hence after 23 characters, unless we are lucky enough to need a repeat, we will need to invoke the pattern on the Psis as a modifier.  Clearly the XOR nature of the relationship means the modifiers can still create all possible patterns (until the psi wheels themselves start to repeat). If we are only looking at the first pair, the sequence can be extended freely to 23 + 59 characters, i.e. 82 characters.  However since the Psis all move together, we are actually limited by a different constraint – effectively the shortest Psi wheel (43) dominates matters, after the shortest Chi wheel (23)) starts it rotating.  The sequence for which we have freedom to dictate the mapping is 66 (i.e. 23 + 43) characters, just over the length (64) of “NOW IS THE TIME FOR ALL GOOD MEN TO COME TO THE AID OF THE PARTY” with spaces included, as they are encrypted too.

A diagram may help :

Tunny2

For defined, rather than arbitrary, text one could achieve slightly longer encodings by exploiting the fact that, if by chance the right keystream is ready, the Psi wheel advance can be deferred.  However with a 1 in 32 chance of the keystream being right for the next character, the expansion of 66 characters by this method is limited. For carefully chosen text one might be able to achieve even longer conversions, but I would suspect not much longer (other than for trivial self-similar copies).  An exercise, perhaps, for the reader. To make things work, we therefore need 23 occasions where the Chi wheel dominates the solution (i.e. can be freely set) and on the remaining 41 occasions the Psi wheel is the one freely set. In terms of a practical implementation, one might proceed with a known (arbitrary) Psi wheel setting and set the cams on the Chi wheels for the first 23 characters such that they perform the desired XOR function at this known Psi wheel setting, ensuring that the Psis do not move for 23 characters (i.e. first 22 cams of mu61 are unset).  The next cam of mu61 should be set, and no further cams need be set.  Note that mu61 will have started to repeat just before character 64, but repeats unset cams.  The single set cam should move mu37 into a position where its own cam is set.  This means that the Psis start (and continue) to move after the 23rd character.  The now moving Psis now dictate the pattern – especially for the 23 wheel, which is immediately repeating its Chi sequence – the Psis must be set to XOR-out the effect of the original Chi sequence.  Overall this pattern (23 with Psi static, followed by 41 with Psi moving) is probably the simplest solution, but meets the aim.  Were the five cams for the first Psi setting to be set to a pattern, rather than all unset, the pattern less obviously encode the start of the text on the wheels. The wheel setting, proceeding from the left, with ‘x’ indicating a set cam are as follows :

Chi wheels

Wheel of size 41 ...x..x.x.........xxxx...................

Wheel of size 31 x..xx..x...xx..x....x.x........

Wheel of size 29 .x.x.xx.x...x..x.x..x.x......

Wheel of size 26 xx..xx.x.x..x...xxx.......

Wheel of size 23 .x.....xx...xx.xxx...xx

Motor wheels

Wheel of size 37 .x...................................

Wheel of size 61 .....................x.......................................

Psi wheels

Wheel of size 43 ....x...xx..x..x......x.xxxx...xx.x.....xx.

Wheel of size 47 .x..x.......x.xxx...xx.x..xx....xx....xxx......

Wheel of size 51 ....x..x.x.x..xxxx.xx...x.x...x.x.xx..xxxx.........

Wheel of size 53 .xx..x.xx..x.x.x...xxxx.x..xxx.x..xx.xxx.............

Wheel of size 59 ..x.x....xxxx..xxxx.x.x.x..x..x.x.x.x..x.x.................

and this, if my implementation of Tunny is correct, changes between: I WANDERED LONELY AS A CLOUD THAT FLOATS ON HIGH OER VALES AND H NOW IS THE TIME FOR ALL GOOD MEN TO COME TO THE AID OF THE PARTY Note that the motor advance was subject to ‘limitations’ in later models (SZ42A and SZ42B) which changed the pattern slightly.  It is not considered that this changes the above analysis in a material way – i.e. the basic ideas are constant.  Some limitations were dependent on the plain text and would however affect the ability to define an arbitrary transformation. As far as I can tell my implementation of Tunny is correct, but I would welcome any observations or corrections.   The code, in Java, is Tunny.   It is a .docx because of WordPress limitations. You will need to copy it via the clipboard to a text file named Tunny.java.