TDD-Deciphered.com

Part 1: Developing an Enigma Simulator with PHPUnit

26/12/2009
Back in August this year, I finally got around to visiting Bletchley Park near Milton Keynes, somewhere that I'd been meaning to visit for years but never quite got around to. It proved to be a fascinating day out, and I've since developed a real interest in the Park and its history.

For those unfamiliar with the place, Bletchley Park was the site of the British Government Code and Cipher school in the Second World War, the place where German and Japanese secret codes were broken, giving the Allies vital information to help shorten and finally win the war. The most famous of these code systems was the German 'Enigma' machine, now seen in a number of books and films, but until not so long ago a heavily classified government secret. Bletchley also developed Colossus, one of the world's first electronic computers, used not to break Enigma, but rather the "Tunny" machine cypher used by the leaders and generals of the Third Reich for their highest-level strategic communications. A working replica of Colossus 2 can now be seen at the National Museum of Computing, which shares the site at Bletchley.

The actual history of what happened at Bletchley is not widely known due both to the technical nature of the work and the incredible secrecy that surrounded it for so long. I'll include a certain amount of background information in this series, but for those wanting more information, these books are a good place to start.

The main thing I'll be talking about here is the Enigma Machine itself, as I build a simulation of it in PHP using the 'Test-Driven Development' technique. I hope this will provide some useful examples for anyone interested in adopting this technique, and also give you an insight into the fascinating history of those who used and broke these codes.

To start with, a little background:

What is, or was, "Enigma" ?


"Enigma" was the name given to a range of rotor-based electromechanical encoder/decoders designed to encode plain text in an unbreakable format. The Enigma cipher was symmetrical - in other words, encoding a message and decoding it worked in exactly the same way, so the same machines, set up the same way, could both encode and decode messages.

The Enigma system was invented in Germany in the 1920s as a commercial system, but was adopted and modified by the German National Socialist (Nazi) party and military, and used as the primary form of strong cipher for tactical messages during the Second World War by Axis forces. This was a mobile system (the Engima machine is about the size of a typical desktop computer) used (in various forms) by Army units, the Luftwaffe, and German Ships and U-boats.

The repeated breaking of the Enigma ciphers and their successors was one of the most important, most incredible and most secret intellectual feats of the Allied forces in the war. It was a co-operative effort by cryptographers from many nations, particularly Polish, French, British and American. The effect of the Allied ability to read large amounts of tactical enemy communications (completely unknown to the Axis, who believed the code unbreakable and unbroken to their defeat and beyond) was to drastically increase the effectiveness of Allied military and intelligence operations. It is often suggested that the codebreaking program shortened the war by two years, both by helping track and defeat the U-boats in the Battle of the Atlantic, and by giving masses of detail on German troop strengths, dispositions and beliefs that helped make the D-Day landings possible.

Why simulate it?


Well, the historical background alone makes the system fascinating to study, but it appeals to me for a number of technical reasons, too. One is that it's a system which appears incredibly complex at first glance, but breaks down under careful analysis to become a clear and understandable system. It's also a task particularly suited (for reasons that should become clear below) to Test-Driven Development, as we build and test small pieces to construct a complete system. That's a technique I both want to be able to explain, and to understand more of myself. Once the code and algorithms are built, I also want to learn to implement them as an interactive web front-end, and possibly as an iPhone application.

What is "Test-Driven Development" anyway?


TDD, in its strictest form, is a practice in which, prior to writing every "unit" of functionality, tests are written to ensure that that new code performs the expected requirements. These tests can then be kept and used to check for any subsequent bugs (or "regressions") that might later sneak into the code. Therefore, TDD code is known to work at the time of creation and, if the tests are again run later, at any time in the future.

In terms of this project, I'm planning to test all the simple elements (to be detailed later) to build a larger complex system that I can trust to be reliable.

For more detail on TDD in general, see http://en.wikipedia.org/wiki/Test-driven_development . I'm not planning to use 100% pure TDD here, but it'll be fairly close.

Technical Goals


I'm not going to try and teach basic PHP with this project – that's been done elsewhere. I want to concentrate on a few more advanced aspects of the development process with which I'm reasonably familiar, but haven't had the chance to carry through in depth through a whole, self-contained project.

The areas I'm aiming to look at are:

- Strong object encapsulation and good architectural design, to provide a slightly more convincing case for object-based code than the traditional "cars have wheels" explanation.

- Test-driven development on an evolving system, as this is another area where the usual examples are far too simple to be useful or convincing. Here, we have a complex system of many parts which produces outputs that simply cannot be validated by eye.

- Good practice in version control, so we know our code's safe (and so that I can easily refer to steps of the development process).

- Good code commenting and clean coding style, so that we can look at a block of code and rapidly understand it without having to crawl around the system logic.

There are also a few things I don't intend to do:

- Use a database, or any form of persistent storage. It's not required for a project like this.

- Use a framework in the simulator. I use Zend Framework for almost all my other projects, and I'm reasonably fond of it. However, I want the simulator to be fully self-contained code, so that readers don't have to learn an external toolkit to follow it.

- Use a web server, at least until the simulator works and we want to add a web interface. This system will be built, tested and run at the command line - so I'll need to make the tests comprehensive.

- Use a debugger. This is in no small part because I've never fully learned to do so, and it's not something I want to learn as part of this particular project. However, it's also a conscious decision not to have to crawl around inside the guts of a system when I'd rather build it to be internally robust and rigorously tested. In other words, TDD is about not putting the bugs in, rather than having to chase them out. Realistically, bug free code is unlikely, but we'll see where it gets us.

Oh, and I'm not going to try and fully specify how the project will work before I start coding. I know the principles of how an Enigma machine works, but not the details, so I'll be learning implementation details as I go. I need to do just enough up-front design to build a sensible structure before I dive in to the detail.


In the second instalment in this series, I'll take a look at the physical architecture of an Enigma machine, and do that basic design for the object model onto which I'll map it.

All content copyright Richard George (richard@phase.org), 2009-2010

Sponsored links to recommended books: