The Substitution Breaker is now Open Source


Finally I put a Python implementation of the substitution breaker on GitLab. For performance reasons the online version here is implemented in C, but I thought it would be nice to publish a Python variant instead. From the algorithm perspective both versions - and therefore the accuracy as well - are identical.

The Python implementation provides a set of basic operations to work with substitution ciphers. All functions are provided by a CLI as well as by a set of Python classes which can be used by other Python applications. Another nice feature (I think): the substitution breaker is not limited to the latin alphabet (a..z), it supports any set of characters, and for breaking ciphers without knowing the key the only restriction is the length of the alphabet: it may not exceed 32 characters. Having said that, the implementation is able to break ciphers using e.g. the cyrillic alphabet. Support for any language can be added easily, this only requires a text corpus which is used to generate the quadgrams which are an essential part of the breaker. A text corpus is a large collection of text for a given language, which is used for statistical analysis.By default, only English is supported right now by the breaker, but everyone can add her own favourite language.

I plan to add more documentation on the usage of the breaker on the git repository soon.