Search code examples
javautf-8character-encodingspecial-characters

Get original string in Java, encoded in unknown format in legacy application


I am trying to get an original string, encoded in unknown format. I don't know what encoding it uses in legacy application.

For example, user enters Special[Home]^ in the legacy system, it saves "Special¢Home!¬" into the DB2 database.

It properly works in the legacy system where it encodes and decodes in the right format as Special[Home]^.

When I try to read same data Special[Home]^ from a Java app, it reads as Special¢Home!¬

I also tried finding the right encoding using the code shown below, but doesn't work. Any help would be appreciated.

@Test
public void charsetTest() {
  String encodedString = "Special¢Home!¬";
  String originalString = "Special[Home]^";
  Map<String, Charset> availableCharsets = Charset.availableCharsets();
  Set<String> keySet = availableCharsets.keySet();
  for (String key : keySet) {
    Charset charset = availableCharsets.get(key);
    try {
      String decodedString = new String(charset.encode(encodedString).array(), charset);
      System.out.println(decodedString + ":  " + charset);
      if (originalString.equals(decodedString)) {
        System.out.println("match found: -> " + originalString + ":  " + charset);
      }
    } catch (UnsupportedOperationException e) {
      /*  e.printStackTrace(); */
    }
  }
}

Solution

  • Considering possibility of double conversions, try this code:

    public static void main(String[] arg) {
      String encodedString = "Special¢Home!¬";
      String originalString = "Special[Home]^";
      Map<String, Charset> availableCharsets = Charset.availableCharsets();
      Set<String> keySet = availableCharsets.keySet();
      for (String key : keySet) {
        for (String key2 : keySet) {
          Charset charset = availableCharsets.get(key);
          Charset charset2 = availableCharsets.get(key2);
          try {
            String decodedString = new String(charset.encode(encodedString).array(), charset2);
            if (originalString.equals(decodedString)) {
              System.out.println(originalString + ":  " + charset + " -> " + charset2);
            }
          } catch (UnsupportedOperationException e) {
            /*  e.printStackTrace(); */
          }
        }
      }
    }
    

    It produces some combinations.

    output:
    Special[Home]^:  IBM-Thai -> x-IBM1166
    Special[Home]^:  IBM-Thai -> x-IBM875
    Special[Home]^:  IBM01140 -> IBM01148
    Special[Home]^:  IBM01140 -> IBM500
    Special[Home]^:  IBM01140 -> IBM870
    ...