I have a bunch of old AES-encrypted Strings encrypted roughly like this:
Now I would like to change the encoding to UTF8 for new values (eg. '€' does not work with ISO-8859-1). This will of course cause problems if I try to decrypt the old ISO-8859-1 encoded values with UTF-8 encoding:
org.junit.ComparisonFailure: expected:<!#[¤%&/()=?^*ÄÖÖÅ_:;>½§@${[]}<|'äöå-.,+´¨]'-Lorem ipsum dolor ...> but was:<!#[�%&/()=?^*����_:;>��@${[]}<|'���-.,+��]'-Lorem ipsum dolor ...>
I'm thinking of creating some automatic encoding fallback for this.
So the main question would be that is it enough to inspect the decrypted char array for '�' characters to figure out encoding mismatch? And what is the 'correct' way to declare that '�' symbol when comparing?
if (new String(utf8decryptedCharArray).contains("�")) {
// Revert to doing the decrypting with ISO-8859-1
decryptAsISO...
}
When decrypting, you get back the original byte sequence (result of your step 1), and then you can only guess whether these bytes denote characters according to the ISO-8859-1 or the UTF-8 encoding.
From a byte sequence, there's no way to clearly tell how it is to be interpreted.
A few ideas:
EF BB BF
) aren't strictly illegal in ISO-8859-1 (being read as 
), they are highly unlikely. Then, when your decrypted bytes start with EF BB BF
, decode to string using UTF-8, otherwise using ISO-8859-1. Still, there's a non-zero probability of error.If ever possible, I'd go for migrating the existing entries. Otherwise, you'll have to carry on with "old-format compatibility stuff" in your code base forever, and still can't absolutely guarantee correct behaviour.