Addendum 1. Introduction To Encoding And Encryption
Encoding and Encryption are techniques that transform a string of characters into some other form for a specific reason. In the sense that they are used in computing, encoding is a transformation that alters the look of the object, so that the result meets some specific criteria. Encryption is a transformation designed to disguise or hide the original contents.
Encoding
Encoding changes the format of an object to meet some criteria. It is a reversible process, so that the encoded format can later be decoded to recover the original object.
The Encoding Process
Let us say that you want to send a message consisting of a normal English language sentence:
SECURITY IS IMPORTANT.
However, there is a restriction that you may only send the decimal digits: 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9.
To do this, we use a simple set of rules:
Instead of A, send the digits 01;
Instead of B, send the digits 02;
Instead of C, send the digits 03;
Instead of D, send the digits 04;
Instead of E, send the digits 05;
......
Instead of X, send the digits 24;
Instead of Y, send the digits 25;
Instead of Z, send the digits 26;
Instead of the space character, send the digits 27;
Instead of the period character, send the digits 28.
We take the original sentence, and replace each character with its code:
19 replaces the S
05 replaces the E
03 replaces the C and so forth
We can now sent the string:
19050321180920252709192709131615182001142028. If we put some spaces in the previous line so it is more legible, it looks like this:
19 05 03 21 18 09 20 25 27 09 19 27 09 13 16 15 18 20 01 14 20 28.
When the message is received, the recipient does a reverse translation:
S replaces the 19
E replaces the 05
C replaces the 03 and so forth resulting in the original sentence.
Encoding Applications
The main application of encoding that we will consider is the transmission of e-mail attachments. E-mail was originally designed for sending English-language text. It was based on the ASCII character set which allows 128 unique characters. 128 is sufficient for representing the 26 letters of the English alphabet in upper and lower case, the 10 digits, a number of special characters (such as comma, period, brackets, etc.) and a variety of control characters (such as tab and end-of-line).
Unfortunately, many languages include more characters than English. Programs, word processing files, pictures, and many other types of files are composed of 8-bit bytes which allow 256 unique characters. None of these could be sent in e-mail.
To overcome this problem, the concept of attachments was developed, in which the file to be transmitted would first be encoded so that it would only contain the legal ASCII characters. This process is similar to how our sample sentence was encoded using only digits. As with our sample, the resultant encoded message is longer than the original, but it can be transmitted legally, and, when received, decoded into its original form.
Unicode
Unicode is a method of encoding all characters used in all commonly used languages so that computers may uniformly handle them. Details are available through the Unicode Consortium (http://www.unicode.org), in brief:
“Fundamentally, computers just deal with numbers. They store letters and other characters by assigning a number for each one. Before Unicode was invented, there were hundreds of different encoding systems for assigning these numbers. No single encoding could contain enough characters, for example, the European Union alone requires several different encodings to cover all its languages. Even for a single language like English, no single encoding was adequate for all the letters, punctuation, and technical symbols in common use.
These encoding systems also conflict with one another. That is, two encodings can use the same number for two different characters, or use different numbers for the same character. Any given computer (especially a server) needs to support many different encodings, yet whenever data is passed between different encodings or platforms, that data always runs the risk of corruption. Unicode is changing all that!
Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language. The Unicode Standard has been adopted by such industry leaders as Apple, HP, IBM, JustSystem, Microsoft, Oracle, SAP, Sun, Sybase, Unisys and many others.”
Encryption
Encryption is similar to encoding in that the process transforms some original text or object into another form. In this case, the intent is to hide the original contents.
There are three types of encryption that we will be looking at:
• Symmetric Encryption
• Public-key Encryption
• One-way Hash Encryption
Symmetric Encryption
In its simplest form, symmetric encryption is similar to encoding. The characters in the original object are transformed. A very simple-minded encryption algorithm
(rules governing the process) is to take each alphabetic character and replace it with 1 character higher. So:
A is replaced by B
B is replaced by C
C is replaced by D
......
X is replaced by Y
Y is replaced by Z
Z is replaced by A (at the end of the alphabet, it loops back to the beginning)
If we use this algorithm, our sample sentence becomes (ignoring the space and period in this simple case):
TFDVSJUZ JT JNQPSUBOU
The message is now disguised. The recipient will do the reverse translation, changing each letter by using the previous letter and will obtain the original sentence.
Instead of shifting each character 1 place, we could have shifted them some other number of characters. As long as the recipient knows the number of shifts, they can decrypt the message.
The number of shifts is called the encryption key. This same number is used to encrypt the message, and later decrypt it. Julius Caesar used this encryption method to keep messages he sent secret (he used a key of 3).
With this simple algorithm, if the message is intercepted and the interceptor understood the concept of encryption, he or she might be able to guess the contents by trying various shifts. If the algorithm was more complex than simply shifting each letter by the same amount, it would be more difficult to decipher. Until recently, many encryption algorithms were just such shifting algorithms.
Today, instead of shifting letters, we use mathematical formulas to encrypt messages. We still use a key and this key is part of the formula to perform the encryption. If you want to decrypt the message, you need the key. If you don’t have the key, you could, of course, try various keys until the message made sense. If the key was restricted to the numbers from 1 to 10, this guessing would not take very long. If it were allowed values from 1 to 100, it would probably take longer. Today, keys typically are 128-bit binary numbers. That is equivalent to about 340,000,000,000,000,000,000,000,000,000,000,000,000 possible choices and guessing is not practical.
Symmetrical encryption is used when it makes sense for both the sender and recipient to use the same key (that is, they need to agree to it ahead of time). It is used for encrypting messages while they are being transmitted, over a wireless link, for example, and for encrypting information on disk so that others cannot read it. In the latter case, if you lose the key, the data is essentially lost!
Public-key Encryption
Public key encryption is similar to symmetrical encryption with one major exception. Instead of one key, there are two. A different key is used to encrypt the message than is used to decrypt it. In a typical use, the first key is made public and anyone can learn it. If you want to send me a private message, you use my public key that I have given to everyone to encrypt it. To decrypt the message, my private key (which is different from the my public key) is needed, and I do not share that key with anyone else. If your message is intercepted, no one else can read it.
Note that in this simple case, I cannot be sure who sent me the message, because anyone might have my public key, but you can be reasonably sure that only I can read it.
Pubic/Private keys can also be used in reverse. In this case you encrypt the message with your private key, and anyone who has your public key can decrypt it.
One-way Hash Encryption
You can think of a one-way Hash encryption as a type of public-key encryption for which no one has the private key. So things can be encrypted, but not decrypted. It is different in that the encrypted message is typically relatively short. A common one-way hash encryption algorithm is called MD5. The output of the MD5 algorithm is always 128 bits (16 bytes). If you create a hash code for two different things, the chances are virtually zero that the two hash codes will be the same.
There are two prime uses of such a code:
Authentication
You can take a long document or a program, compute the MD5 code for it, and keep the code in a safe place. Later, you can go back and compute the code again. If the new code is different from the original one, you will know that the document or program has been changed. Even a tiny change in a large document or program will result in a markedly different MD5 code.
Storing passwords
In many systems, when a user sets a password, it is encrypted using MD5 (or a similar algorithm) and that encrypted version is stored. When the user later attempts to sign on, what they enter is again encrypted, and compared to the one on disk. If they match, you know the password was correct. Note that it is not possible to decrypt the password if the user forgets it – a new one must be set. This method is used because it never allows your password to be seen in its original form. Unfortunately, there is still one problem and this is the reason why one should not use passwords that are short, simple, or guessable words: if you obtain a list of encrypted passwords (from a system that you broke into), it is easy to encrypt all sorts of “easy” passwords to see if the encrypted versions match those in the password table.
Digital Signatures
If I want to send you a message, and ensure that you know that I was the one who sent it, I can use a combination of the encryption techniques:
• I compose the message, and I use MD5 to create a hash code for the message.
• I encrypt the hash code using my private key.
• I send you the message, and the encrypted hash code.
• You receive the message.
• You decrypt the hash code using my public key, which will result in the original hash code.
• You take the text of the message that I sent, and calculate an MD5 hash code from that.
• If the two hash codes are identical, then you can be sure that the message has not been changed since I sent it (otherwise it would result in a different hash code) and that I was the one who sent it (otherwise my public key would not have allowed you to decrypt the original hash code.
The Digital Certificates used by web browsers for secure authentication rely on digital signature techniques such as this one.
|