Breaking Handycipher – the method

Decryption keys

[Update – this post from 19 June 2014 applies to v1.3 of the Cipher – the 20th April 2014 publication.  Find that on the IACR archive.   The original link now (July 2014) points to version 2.1, which has been updated in the light of this attack method.]

To cut straight to the chase, the decryption keys for the challenge messages in the Handycipher paper are (with space represented by ^):

C1/M1 : “LFPTDWXCYS^UIEKVR?-QBGHO,ZNMJ.A”
C2/M2 : “GVTZBK.^WQLR?-FMD,AOPXNSJICYEUH”

and the decryptions are in my last post. Refer to that instead if you don’t want method spoilers.

Solution method.

The first point to exploit is that nulls stand as singletons and cannot abut nulls. Using this fact, a relatively simple piece of computer code can effectively determine which of the 30 characters are the five nulls, given a reasonable length of ciphertext. There are five selections from thirty = 142,506 possible groups of five nulls, and a computer can quickly (seconds) test all of them, excluding those that are impossible. For plaintext of length 506 characters (the size of the ‘challenge’ texts in the paper) experiment suggests 10 or 20 possibilities usually remain. For text of length 1000 characters, there is usually only one possibility. Interestingly the challenge texts are helpfully limited to two (for C1) or just one (for C2) possibilities, making them (presumably by chance) slightly easier to decode, albeit that, in my test examples where there are a dozen possible solutions, they are usually closely clustered – e.g. three of five letters are common to all possibilities.

The analysis is slightly complicated by the existence of a run of about 85 characters at an unknown position in the ciphertext encoded by a different cipher (and hence invalidating the characteristics sought) – this is the encoding of the key. However performing the test using a running window excluding a group of this size allows the analysis to continue (albeit taking longer). This process has the beneficial side-effect of largely locating the inserted key, which could be attacked, or simply temporarily removed to continue the analysis on the text. Compute time so far is still a few minutes at worst and could be improved.

The second exploitable point is that adjacent letters in the cipher text have a greater than random chance of being in the same row, column or diagonal as each other, although we do not know which row, column etc or whether it is a row, column or diagonal (or even whether they actually are, it is only a statistical correlation).

By using the identified nulls to delimit the text, we obtain the remaining adjacent pairs of letters. A pair of letters that appears in the list has an approximately 60% chance of lying in the same row, column or diagonal compared to a 40% chance if randomly selected. This difference can be exploited (as can the fact that a repeated pair is even more likely to be related).

A Monte-Carlo hill-climbing optimisation was now used to select the key-matrix with the maximum likelihood, based on the above assumptions about the probabilities of pairs lying in a line. Importantly experiments for the plain text size of the challenge messages (506 characters) showed that greater than 80% converged to the correct solution within a few hundred replications (hill-climb restarts), and importantly and unusually, none found a solution with a better likelihood. Hence the method is highly robust. This is important for an intermediate step in the break, where there is no other confidence that the correct avenue is being pursued.

When used in anger, on an unknown ciphertext, it is clearly unknown whether the solution is correct and exact (80+%) or wrong, but extra replications may suffice to resolve this. This has not been examined, and it might alternatively be the case that some key-text combinations do not lead to convergence, however a cipher where 80% of messages (of length 506) can be broken is broken anyway.

The optimisation can take some time – perhaps an hour for certainty.

While the solution is exact in that there is no more optimum one and the key is directly related to it, it cannot be distinguished by the Monte-Carlo analysis from certain permutations and reflections of itself. While not proven analytically, the solutions seem to be based on a reflection (or not) in the leading diagonal, then any permutation of the four rows excluding the middle one, followed by any permutation of the four likewise columns. These lead to 2 x 4! x 4! or 1,152 possibilities. This is few enough that all can be tested without determining if some should be excluded.

There are two remaining uncertainties. We know the five nulls, but not their ordering, hence there are 5! = 120 further possibilities to test. These could be ignored – the solution will likely be readable anyway if there are five characters that are permuted among themselves, but the extra work factor of 120 is not arduous for a computer and may help the computer get the right solution by improving the correlation with text.

There is a further, more significant factor. We know the key-grid (5×6 = 30 characters), or at least we know it to within 138,240 possibilities (1,152 x 120). The final key step uses the same character ordering with a space inserted somewhere as a monoalphabetic key. We have no way of knowing where this space is from the analysis so far, and, if the space is at the wrong place in the key, many characters may be wrong. However there are only 31 places for the space to be, and the computer can try all. The problem remaining now is therefore to optimise the selection of the parameters such that the result bears maximum resemblance to (English?) text, noting that spaces (a very common character) are included. We have the finite space to search of 1,152 x 120 x 31 = 4,285,440 possibilities, which, while daunting for a human, is a few minutes work for a computer. A non-exact solution (i.e. the wrong choice) is still very likely to be understandable text, merely needing final polishing.

In addition, we have removed a non-exact number of characters for the representation of the key. It is relatively easy to remove the correct number by trial and error once the solution can be read as English text, but this could be vulnerable to error – in the worst case a ‘NOT’ might either belong in a sentence or not!

An interesting exercise on an interesting cipher.

Stuart Combes

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s