Search code examples
javaencryptionbytearraysxor

Get the key-String by only knowing the XOR-encrypted byte-arrays and the key-size


I have a key of a known size, for example:

String key = "A B C"; // Unknown / This is what I need to guess in the end
int keySize = key.length(); // Known

I know both the key and the texts only contain the following chars:

String AVAILABLE_CHARS = "ABCDEFGHIJKLMNOPQRSTUVWXYZ .,!?-"; // Known

I have some texts that were encoded by XOR-ing the text with the key. The encode-method does the following: Checks if the key and UPPERCASE text aren't null nor empty and only contain valid chars, then creates UTF-8 byte-arrays of the Strings and XORs them together to one byte[]. (If the text is longer than the key the key gets repeated again.)

byte[][] encryptedTexts = new byte[5][];
// The original texts are Unknown, the encrypted byte-arrays are Known
encryptedTexts[0] = encode(key, "THIS IS A TEST");
encryptedTexts[1] = encode(key, "This is another test!"); // Note: encode first makes the String UPPERCASE, so this encrypts correctly.
encryptedTexts[2] = encode(key, "SOME OTHER RANDOM TEXT");
encryptedTexts[3] = encode(key, "AND LET'S SEE HOW THIS GOES"); // Should return null since ' in LET'S isn't valid
encryptedTexts[0] = encode(key, "OK, THAT WILL BE ENOUGH FOR NOW..");

After encoding I have the following encrypted byte-arrays (Arrays.toString(byte_array)):

ENCRYPTED TEXT 1: [21, 104, 11, 115, 99, 8, 115, 98, 97, 99, 21, 101, 17, 116]
ENCRYPTED TEXT 2: [21, 104, 11, 115, 99, 8, 115, 98, 97, 13, 14, 116, 10, 101, 17, 97, 116, 7, 115, 23, 96]
ENCRYPTED TEXT 3: [18, 111, 15, 101, 99, 14, 116, 10, 101, 17, 97, 114, 3, 110, 7, 14, 109, 98, 116, 6, 25, 116]
ENCRYPTED TEXT 4: null
ENCRYPTED TEXT 5: [14, 107, 110, 0, 23, 9, 97, 22, 0, 20, 8, 108, 14, 0, 1, 4, 0, 7, 110, 12, 20, 103, 10, 0, 5, 14, 114, 98, 110, 12, 22, 14, 108]

So, now my question: How can I get the key by only knowing the encrypted texts and the key-size?

Some thoughts:

  1. I know you can easily get the key by XOR-ing the original text with the encrypted text. Problem: I don't have the original text.
  2. I know you can partly decrypt one text by using another text's repeated words (like " the ") and then guess the other part. Problems: This only works when the text(s) are pretty long, contain the guessed word (like " the ") and ARE words in general. This method won't work when the original texts are also just randomly generated characters, even when the size is very large / 100,000+.
  3. I know that XOR-ing the same characters with each other will return a 0-byte. In the example above, with the 5th encrypted text, we see a few 0's. When a 0 is found this means that the original text and the key share the same character at the same index. Problem: I don't have the original text.

Is it even possible to get the key when you only know the encrypted byte-arrays (inifite amount of them) and the key-size? And if yes, what would be the best approach?

Some NOTES:

  • I don't care about decrypting the encrypted texts, my goal is to get the key-String.
  • If you are going to post example code, please do this in Java, since that's the programming language I'm working with.
  • This is just an assignment (not for school, but for a Java cursus), so I'm not going to crack something with it. (Although I would probably laugh at people that use XOR-encryption with the same key.. XOR-encryption should only be done with a truly-random generated key of the same size as the text or larger, also known as an One-Time Pad. Quote: "With a key that is truly random, the result is a one-time pad, which is unbreakable even in theory." [source].)

EDIT 1:

Ok, forget about the random generated unencrypted texts, let's just assume I have a large English text that has been encrypted. If I know beforehand that the text is English, I can use a Letter Frequency Analysis Table. So then I not only know the encrypted texts and the key-size, but also these frequencies of the letters. How can I use this additional frequencies in order to get the key. (Let's assume I have infinite amount of encrypted text to my possession for recreating / guessing the key using XOR decryption.)


Solution

  • You might only be interested in the key, but try to instead focus on getting one of the plaintexts. This will of course trivially yield the key.

    Start by xor'ing pairs of plaintexts together (if they are different lengths, truncate the longest). This removes the key and leaves you with pairs of english sentence(-fragments) xor'ed together.

    Assuming unlimited ciphertexts we can take an easy approach:

    Take one ciphertext and xor it together with say 1000 other ciphertexts. Look at all positions where the 6th bit is 1 in about 90% of the pairs. These positions must have one of [ .,!?-] in the first ciphertext with about a 80% probability of being a space. Assume this is a space and compute what the equivalent key-byte must be if this is true.

    Repeat this for a bunch of other ciphertexts and you will be able to decide which of the [ .,!?-] actually was spaces (~80% will have the same key-value in this spot).

    Here is an implementation in Java. It usually uses a few thousand messages to find the key:

    import java.io.IOException;
    import java.nio.file.Files;
    import java.nio.file.Paths;
    import java.util.Random;
    
    public class MultitimePad {
        private final static int SPACE_TEST_NUM = 10;
        private final static int SPACE_TEST_MIN = 8;
        private final static int KEY_GUESS_MIN = 10;
        private final static double KEY_GUESS_MIN_PERCENTAGE = 0.8;
    
        public static void main(String[] args) throws IOException {
            MultitimePad p = new MultitimePad();
            byte[] key = new byte[256];
            new Random().nextBytes(key);
            byte[][] messages = p.generate(key);
            byte[] solvedKey = p.solve(key.length, messages);
            if (compareBytes(key, solvedKey)) {
                System.out.println("Success");
            } else {
                System.out.println("Failure");
            }
        }
    
        private byte[][] generate(byte[] key) throws IOException {
            byte[] data = Files.readAllBytes(Paths.get("src/ulysses.txt"));
            byte[] filteredData = new byte[data.length];
            int filteredDataLength = 0;
            for (int i = 0; i < data.length; i++) {
                byte p = data[i];
                if (p >= 'a' && p <= 'z') {
                    filteredData[filteredDataLength] = (byte) (p - 'a' + 'A');
                    filteredDataLength++;
                } else if (p >= 'A' && p <= 'Z') {
                    filteredData[filteredDataLength] = p;
                    filteredDataLength++;
                } else if (p == ' ' || p == '.' || p == ',' || p == '!' || p == '?' || p == '-') {
                    filteredData[filteredDataLength] = p;
                    filteredDataLength++;
                }
            }
            int numMessages = filteredDataLength / key.length;
            byte[][] messages = new byte[numMessages][];
            for (int i = 0; i < numMessages; i++) {
                messages[i] = new byte[key.length];
                for (int j = 0; j < key.length; j++) {
                    byte p = filteredData[i * key.length + j];
                    messages[i][j] = (byte) (p ^ key[j]);
                }
            }
            return messages;
        }
    
        private static boolean compareBytes(byte[] b1, byte[] b2) {
            if (b1.length != b2.length) {
                return false;
            }
            for (int i = 0; i < b1.length; i++) {
                if (b1[i] != b2[i]) {
                    return false;
                }
            }
            return true;
        }
    
        private byte[] solve(int length, byte[][] messages) {
            byte[] key = new byte[length];
            for (int i = 0; i < length; i++) {
                key[i] = solvePosition(i, messages);
            }
            return key;
        }
    
        private byte solvePosition(int pos, byte[][] messages) {
            int[] keyGuessCount = new int[256];
            int totalKeyGuess = 0;
            for (int i = 0; i < messages.length - SPACE_TEST_NUM; i++) {
                int success = 0;
                for (int j = 0; j < SPACE_TEST_NUM; j++) {
                    if (((messages[i][pos] ^ messages[i + j][pos]) & ' ') != 0) {
                        success++;
                    }
                }
                if (success >= SPACE_TEST_MIN) {
                    int keyGuess = (messages[i][pos] ^ ' ') & 0xFF;
                    keyGuessCount[keyGuess]++;
                    totalKeyGuess++;
                    if (keyGuessCount[keyGuess] >= KEY_GUESS_MIN && keyGuessCount[keyGuess] > totalKeyGuess *
                            KEY_GUESS_MIN_PERCENTAGE) {
                        System.out.println("Found " + pos + " using " + (i + 1 + SPACE_TEST_NUM) + " messages");
                        return (byte) keyGuess;
                    }
                }
            }
            throw new IllegalArgumentException("Too few messages");
        }
    }