Reverse Engineering for "Regular People" - How are cracked versions of software created and why are developers not able to prevent it?

Intro

I originally wrote this article for Quora. Here is an updated version of it, where I expanded it without feeling any constrains of the Quora platform. This article is primary intended for "normal people", or non-engineers that don't spend their day trying to figure out how software works, and are just interested to learn the basic technical concepts of the "software cracking" and the "software piracy".

Basic Concepts

By saying "software cracking" or just "cracking" (for brevity), I will assume the process under which the original piece of software (be it a paid version of a commercial application, or a game) is modified in a way that was not originally intended by the developer of that software. Such modification usually entails removal of the parts of the original software that would ensure validity of the software license (for commercial software), or would give certain player a biased advantage in a game.

By saying "source code", I imply the text representation of the software code (or instructions) that was used to compile the software. While "compilation" is the process of conversion of the source code into executable machine instructions that can be read and executed by a computer to "run the software". By its nature the source code is much easier for a human to read and understand than compiled machine code. And thus, source code files usually become a closely guarded secret for many commercial software products. (On the contrary, "open-source" software provides its source code in the open for everyone to see.)

Cracking Software

Cracked versions of software are created with the use of debuggers. (A debugger is a special type of software that lets programmers deconstruct their software into its constituent parts for the purpose of finding bugs, and thus de-bugging. Additionally debuggers can be also used for reverse engineering, or for the process that allows researchers to see what is inside the software, or to learn its logic. The latter method is used mostly by malware researchers to study what malware (or computer viruses) do on-the-inside. But it can be also used by an attacker to "crack" (or bypass) legal software registration, or at times, to alter normal behavior of software, for instance, by injecting a malicious code into it.)

For the sake of this example, I will assume that the software that is being "cracked" was compiled into a native code, and is not a .NET or a JavaScript based application. (Otherwise it will be somewhat trivial to view its source code.) The compiled native code is a bit more tricky "beast" to study. (Native means that the code executes directly by CPU, GPU, or other hardware.)

So let's assume that the goal of an attacker is to bypass the registration logic in the software so that he or she doesn't have to pay for it. (Later for lolz, he or she may also post such "crack" on some shady online forum or on a torrent site so that others can "use" it too and give him or her their appreciation.)

For simplicity let's assume that the original logic that was checking for the software registration was written in C++ and was something similar to the following code snippet:

C++ Source Code[Copy]

if(!isRegistrationCodeGood(RegistrationName, RegistrationCode))
{
	MessageBox(hWnd,
		L"Sorry, your registration code is incorrect. Please try again.",
		L"Registration Error",
		MB_ICONERROR | MB_OK);

	return;
}

rememberRegistrationParameters(RegistrationName, RegistrationCode);

MessageBox(hWnd,
	L"Your registration is complete. Thank you for registering!",
	L"Registration Success",
	MB_ICONINFORMATION | MB_OK);

In this code sample RegistrationName and RegistrationCode are special strings of text that a legitimate software user will receive after paying for the license. (The name is usually that person's actual name or their email address, and the code is some string of unique/special characters that is tied to the name.)

In the logic above, the function named isRegistrationCodeGood() will check if RegistrationName and RegistrationCode are accepted using some proprietary method. If they are, it will return true. Otherwise false. That outcome will dictate which branch (or scope) the execution will follow next.

So the logic above will either show that registration failed and quit:

Sorry, your registration code is incorrect. Please try again.

Or, if the registration code and name matched, it will save the registration details in persistent storage (such as the File System or System Registry) using the function named rememberRegistrationParameters() and then display the message thanking the user for registering:

Your registration is complete. Thank you for registering!

A "cracker" (which is my lazy way of abbreviating the term "software cracker") will obviously want to achieve the second result for any registration code that he or she enters. But they have a problem. They do not have the C++ source code, part of which I showed above. (I hope not!)

So the only recourse for an attacker is to disassemble the binary code (that always ships with software in the form of .exe and .dll files on Windows, and mostly as Unix executables inside the .app packages on a Mac.) An attacker will then use a debugger to study the binary code and try to locate the registration logic that I singled out above.

Next you can see the flowchart for a snippet of code that I showed in C++, presented via a low-level disassembler. Or, as the code will be seen in the binary form after compilation:

(For readability I added comments on the right with the names of functions and variables. They will not be present in the code that an attacker could see.)

(To understand what is shown above an attacker will have to have a good knowledge of the Assembly language instructions for the native code for a specific CPU architecture. In this case it is Intel x86-64.)

I also need to point out that having a disassembly snippet like the one above is the final result for an attacker. The main difficulty for him or her is to locate it among millions and millions of other similar lines of code. And that is their main challenge. Not many people can do it and that is why software "cracking" is a special skill.

What's Next?

So having found the code snippet above in the software binary file a software cracker has two choices:

Modify (or patch) the binary.
Reverse-engineer the isRegistrationCodeGood() function and copy its logic to create what is known as a "KeyGen" or "Key Generator".

Let's review both.

Binary Patch

The first choice is quite straightforward. Since an attacker got this far, he or she knows the Intel x64 Instruction Set quite well. So if they simply change the conditional jump from jnz short loc_7FF645671430 at the address 00007FF645671418 (circled in red in the screenshots) to unconditional jump, or jmp short loc_7FF645671430, this will effectively remove any failed registration code entries and anything that the user types in will be accepted as a valid registration.

Also note that this modification can be achieved by changing just one byte in the binary code from 0x75 to 0xEB:

But this approach comes with a "price" of modifying the original binary file. For that an attacker needs to write his own "patcher" (or a small program that will apply the modification that I described above.) The downside of this approach for an attacker is that patching an original executable file will break its digital signature, which may alert the end-user or the vendor. Additionally the "patcher" program made by an attacker can be easily flagged and blocked by the end-user's antivirus software, or lead criminal investigators to the identity of the attacker.

KeyGen

The second choice is a little bit more tricky. Here an attacker will have to study isRegistrationCodeGood() function and copy it into his or her own small program that will effectively duplicate the logic implemented in the original software and will allow to generate the registration code for any name, thus giving any unscrupulous user of that software an ability to register it without making a payment.

Vendors of many major software products understand the potential impact of the second method and try to prevent it by requiring what is known as "authentication." This is basically a second step after registration, where the software submits registration name to the company's web server that returns a response back to the software of whether the code was legitimate or not. This is done by Microsoft when you purchase Windows (they call it "Activate Windows") and also by Adobe, and many other companies. This second step may be done behind-the-scenes in the background while the software is running, and will usually lead to cancellation of prior registration if it was obtained illegally.

So, this is it. Now you know how software is "cracked" 😎

Why Is It Not Possible to Prevent It?

Let me now answer why it is not possible to prevent software cracking. It all boils down to the fact that any software code needs to be read either by CPU (in case of a binary native code) or by an interpreter or a JIT compiler (in case of JavaScript or .NET code.) This means that if there's a way to read & interpret something, then no matter how complex or convoluted the software code logic is, an attacker with enough knowledge and persistence will be able to read it as well, and thus find a way to "break it" like I described above.

There is an argument though that cloud-based software is more secure. Which is true, since its (binary) code remains on the server and end-users do not have direct access to it. And even though cloud-based software is definitely the future, it has some major drawbacks that will never allow it to fully replace your conventional software. To name just a few:

Not everyone has an internet connection, or is willing to upload their data online. Additionally someone's internet connection can be very expensive, or be too slow that will make the software very laggy.
Then there's a question of distributed computing. For instance, Blizzard Entertainment would never make "World of Warcraft" to fully run on their servers due to immense computational resources needed to render every single scene for every player they have. Thus it is in their best interest to let each individual user's computer to do the rendering instead.

Countermeasures & Consequences

As a software developer myself, I obviously don't like when people steal software licenses. But I have to accept it and live with it. The good news is that there are not that many people who are willing to go extra mile and search for a cracked version of software. The main problem for those who do, is that by downloading a patched executable, or an attacker's KeyGen or a Patcher, they are effectively "trusting" an attacker not to put anything "nasty" into it that was not "advertised on the package". Stuff like trojans, malware, keyloggers or even a ransomware. So the question for those people becomes - is it worth the cost of the software license to potentially infect your system with a nasty virus?

On the other side of the equation, some developers react very negatively to any attempts to steal their software licenses. (I was there too.) They try to implement all kinds of countermeasures - anything from tricking reverse-engineers, to adding "booby traps" in the code that may do something nasty if the code detects that it is being debugged, to obfuscating (or scrambling) the code, to enforcing all kinds of convoluted DRM schemes, to blocking users from certain countries.

I personally try to stay away from all of those measures. And here's why:

Any kind of anti-reverse-engineering tactics could be bypassed by an attacker with enough persistence. So why bother and waste my time when I can invest that time into adding something useful to my software that will make it more productive for legitimate users?
Some code packers could create false positives with antivirus software, which is obviously not good for marketing of that software. It also creates unnecessary complexity for the developer to debug the software.
Adding "booby traps" in the code can also misfire on your legitimate users, which will really infuriate them and can even lead to lawsuits.
Any DRM scheme will probably catch some 100 illegal users and greatly inconvenience 1,000 legitimate ones. So why do it to your good customers?
Our statistics show that about 75% of all illegal licenses come from a very small number of countries. Places like China, Russia, Iran, Brazil, to name the worst offenders. (I also understand that the reason why this happens may lie in much lower incomes that people have in those countries.) The main issue for us though was the fact that if we enforce our DRM or add some strong registration authentication, many people that don't want to pay for our license will simply use a stolen credit card. It would cost them way less to get it. And we will have no control over it. Our system will use it to send them a legitimate license only to have the payment reversed by the bank in the weeks time.
And lastly, however counterintuitive this may sound, but some companies may actually benefit from allowing pirated copies of their software. Microsoft for instance gets a lot of free publicity from people using their Windows OS, and the same goes for Adobe with their Photoshop.

Final Thoughts

My philosophy is now this: if someone wants to go extra mile and steal our software, then go for it! They went this far to do it anyway, so they probably have a good reason. On the positive side, there are so many other users that appreciate the work that goes into creating software, who greatly outnumber those that don't.

So, are you a software developer? What do you think? Do you impose DRM? Or, are you a software user? What's your take on it?

Blog Post

Reverse Engineering for "Regular People"

How are cracked versions of software created and why are developers not able to prevent it?