Cracking the Enigma
First Published in the inaugural issue of Software Development's Secure Start Newsletter in February 2003.
Cracking the Enigma
Encryption, World War II and the need for obfuscation.
In any security model, protecting data is an obvious priority. This has been true for hundreds of years and is still true today. Long ago, encryption schemes were trivial since the decryption mechanism was a person on the receiving end of some message that knew how to "undo" the cipher. Even a relatively simple encryption foiled the enemy.
In more recent times, encryption and decryption has been left up to machines. This allowed us to create far more complex encryption schemes to protect data to a much higher degree. However, it also added another layer of vulnerability. Before machine encryption the only two points of compromise were the encrypted data and the human receiving (or sending) the data. Machines add a third and middle layer. Although the machine can't necessarily tell you the encrypted message, it can unwittingly divulge information upon how it does the encryption or decryption. That information can be a springboard to would-be infiltrators.
An Enigma
One of the first famous encryption machines was the Enigma machine, which was used prior to and during World War II. At first, the Enigma machine seemed impenetrable. One enigma machine was used to encrypt the message, and another was used on the receiving end to decrypt it. Observers that saw the data in transit had no hope of deciphering the message.
The Enigma machine's encryption was eventually cracked (a fact that was kept secret for many years thereafter). The Polish mathematicians that cracked Enigma used one of the same seemingly obvious methodologies modern crackers do to crack software data. If the data itself gives no clues on how it can be decrypted, examine the machine itself. The Polish mathematicians intercepted an Enigma machine running through customs. The information they gleaned from understanding the machine (i.e. its algorithm) gave them vital clues to cracking encrypted messages.
Modern day hackers use the same technique - if the data yields no clues, reverse-engineer the program. Even if the encryption algorithm used is inherently secure, the hackers have at least found enough information to discover which algorithm is being used. In fact, they can patch into that code and use actual program code as part of a brute-force decryption program. Just as the Enigma machine could not help giving significant clues on how it worked simply by being examined, the same goes for your company's programs.
Ironically, once the Enigma machine was cracked, other scientists used that information to create better encryption machines. In business terms, the intellectual property of the design of the machine was stolen to make competing products without spending as much on R&D. Just like your any software your company produces, the actual software itself is valuable, intellectual property. Competitors that can examine your programs can see how you do things. Whether that involves understanding your order/entry process or copying your new product's features, the ability to exploit that information is real.
A New Age of Computer Programs
Although this security hole always existed in computer software, only in recent times has it become a significant problem. Specifically, the exposure is far greater in newer computing environments such as Java and Microsoft's .NET (pronounced dot-net). Older programs written in C++ or Pascal are converted (i.e. compiled) into the machine language of a given computer before they become a program. Programmers write "source code", computers run "machine code" so this conversion is necessary. There is (generally) a one way transformation from source code to machine code.
Machine code is not encrypted and is easy for anyone to see, but the format is so tedious for humans that reverse-engineering efforts are slow and painful. The inherent defense of distributing programs in machine code is that the effort required to reverse-engineer them in full doesn't often have enough payback. This isn't to say it hasn't been done, it's just that your competitors could probably start from scratch and finish making a competing product sooner than trying to reverse-engineer your machine language program. To sum it up, it is generally thought of as an acceptable risk to release machine code to the world (every program you buy is machine code) but you'd be crazy to release your source code.
Java and .NET languages (C#, Visual Basic, etc) take a different approach to compilation. From a programmer's point of view, these languages are highly powerful, flexible and understandable. The paradigm they follow truly allows an exponential gain in programming and computer productivity. Unfortunately, the inherent flexibility and understandability present in these languages makes them far easier to reverse engineer than older languages. Their flexibility in part exists because they do not compile to machine code, they compile into something in-between source and machine code something we call intermediate code.
The trade-off is that if Java and .NET did away with the intermediate code, they would give up the huge advantage they have over previous languages in the areas of flexibility and productivity. However, intermediate code is easy to decompile that is, convert intermediate code back to source code. In fact, the process can be automated. Programs called "decompilers" exist that rapidly convert compiled programs back into their native language (Java, C#, etc.). Once a hacker has your source code, he can look at how your application does its work, find embedded passwords, even change a few lines and re-brand the product as his own. In all respects, he has the crown jewels.
The solution
Currently, there is no effective solution for computer program encryption. A computer can't run an encrypted program; it needs to see the instructions. Fortunately, a technology called "obfuscation" exists for computer a program that is analogous to encryption. Obfuscation is the art of shrouding the facts. It restructures intermediate-code computer programs in such a way as to not change their behavior when a computer runs them but re-introduces the tediousness of reverse-engineering by hand. Computers run instruction by instruction and do not particularly consider the program as a whole. Decompilers must look at more of a program than just one instruction at a time, after all, they attempt to reconstruct the program for a human.
Obfuscation works by restructuring a program after a programmer is done with it, but before the end-user sees it. It retains the flexibility of intermediate code but destroys the clues decompilers use to reconstruct source code. At deep levels of obfuscation, decompilers can become worthless.
This issue has not gone unnoticed by big computing companies. Both Sun Microsystems and RSA Security, Inc. obfuscate their Java encryption libraries. Microsoft has struck a deal with PreEmptive Solution's (a founding obfuscation and security company) to include their Dotfuscator(tm) CE obfuscation product in every copy of their Visual Studio .NET developer suite. That is, Microsoft realizes the threat of reverse-engineering is significant and is providing a solution for developers, day one (your job is to use it!). The CE version of Dotfuscator included in Visual Studio provides adequate protection for non-mission-critical applications. Preemptive sells a Professional version for complete protection for enterprise and commercial applications.
If you encrypt your data, but don't obfuscate your code, every time you send critical data into the open, you are giving hackers the very tools they need to figure out your message. If you release a new product without obfuscation, a smart competitor will have your source code in front of them in a matter of a few hours. The answer is pretty simple, if your data is worth protecting, so is your program.
Java and .NET will be the primary computing platforms in the coming years and obfuscation is already an integral part of that. Whether it serves as an extra layer to protect encrypted data or to protect the intellectual property of the program itself, obfuscation is a technology that's a requirement in new computing paradigms.
Paul Tyma is Chief Scientist of Preemptive Solutions, a software security firm that helps businesses reduce risk
|