Search code examples
javaexcelencryptionjackcess

Jackcess: wrong charset for MSAccess database


I have a MS-Access database which has "encrypted" Strings in it. These look like that: encrypted strings

However, I quickly noticed that the length of these Strings matches exactly the length of the plaintext (I know the plaintext). So with a bit of trying with Excel, I found out that if you use the =CODE(<char>)-function (so you get the character-code in the default charset, and =CHAR(<number>) viceversa) and xor this number with the character code of the letter the symbol should represent you always get the same result. Which means I just have to create an array with these values in java and voila. Excel example (on the right the mentioned "Array"): excel example decoding Example: ">>" has an index of (dec) 187, so 187xor253 yields 70 => "F"

Now, I use jackcess to access these values and the "decryption" is mostly fine but I sometimes get the wrong character out of strings. In Excel everything works just fine. Code with the best results:

public static final int[] DECRYPT_KEY = { 253, 203, 204, 217, 226, 205, 128, 201, 222, 183, 58, 217, 230, 201, 183, 211, 158, 203, 167, 213, 35, 33, 201, 123, 186, 247 };

public static void main(String[] args) throws IOException
{
    System.out.println(System.getProperty("file.encoding"));


    Database db = DatabaseBuilder.open(new File("/home/***/TM.db"));                
    Table table = db.getTable("personal");

    for (Row row : table)
    {
        String vorname = row.getString("vorname");
        byte[] vornameArr = vorname.getBytes("cp1252");
        for (int i1 = 0; i1 < vornameArr.length; i1++)
        {               
            vornameArr[i1] = (byte) ((vornameArr[i1] & 0xff) ^ DECRYPT_KEY[i1]);
        } 

        System.out.println(new String(vornameArr, "cp1252"));
    }
}

But as I said, some characters are still wrong, in Excel however everything is fine. When I print out the number which getBytes("cp1272") gives, it is completely different to the Excel one.

Do you have any ideas, what I could be doing wrong and why java gives sometimes such different values than Excel? What would be a better approach? I already tried all combinations of charsets, some worked where others failed but then had other wrong results.


Solution

  • So thanks to @Gord Thompson and the website he suggested (fileformat.info) I finally found an answer: Sometimes characters look similar, and for some reason in the database the "higher" ones are preferred (such as unicode character 402 and 131). My java code expected everything to have the lower value, as excel provided it. So, if the code is higher than 255 it needs to be substituted by a lower value. For some reason, getBytes("cp1252") will always return that lower value, however toCharArray() and getBytes("UTF-16LE") will return the higher, correct value (Compare: fileformat 192)

    So my code is like this now and works perfectly:

    String vorname = row.getString("vorname");
    char[] vornameArr = vorname.toCharArray();          
    for (int i = 0; i < vornameArr.length; i++)
    {
        if (vornameArr[i] > 255)
        {
            vornameArr[i] = (char) (String.valueOf(vornameArr[i]).getBytes("cp1252")[0] & 0xff);
        }
    
        vornameArr[i] = (char) (vornameArr[i] ^ DECRYPT_KEY[i]);
    }
    
    System.out.println(String.valueOf(vornameArr));
    

    Thank you very much for your help!