Search code examples
javastringunicodecharunicode-escapes

Replace Unicode escapes with the corresponding character


I'm trying to convert code points, such as \u00FC, to the character it represents.

import javax.swing.JOptionPane;

public class Test {
    public static void main(String[] args) {
        String in = JOptionPane.showInputDialog("Write something in here");
        System.out.println("Input: " + in);
        // Do something before this line
        String out = in;
        System.out.print("And Now: " + out);
    }
}

An example to explain what I mean:

First Console line: Input: Hall\u00F6

Second Console line: And Now: Hallö

EDIT: Because sometimes it didn't work with multiple Unicodes in The Trombone Willy's answer, here is the Code fixed:

public static String unescapeUnicode(String s) {
    StringBuilder r = new StringBuilder();
    for (int i = 0; i < s.length(); i++) {
        if (s.length() >= i + 6 && s.substring(i, i + 2).equals("\\u")) {
            r.append(Character.toChars(Integer.parseInt(s.substring(i + 2, i + 6), 16)));
            i += 5;
        } else {
            r.append(s.charAt(i));
        }
    }
    return r.toString();
}

Solution

  • Joao's answer is probably the simplest, but this function can help when you don't want to have to download the apache jar, whether for space reasons, portability reasons, or you just don't want to mess with licenses or other Apache cruft. Also, since it doesn't have very much functionality, I think it should be faster. Here it is:

    public static String unescapeUnicode(String s) {
        StringBuilder sb = new StringBuilder();
    
        int oldIndex = 0;
    
        for (int i = 0; i + 2 < s.length(); i++) {
            if (s.substring(i, i + 2).equals("\\u")) {
                sb.append(s.substring(oldIndex, i));
                int codePoint = Integer.parseInt(s.substring(i + 2, i + 6), 16);
                sb.append(Character.toChars(codePoint));
    
                i += 5;
                oldIndex = i + 1;
            }
        }
    
        sb.append(s.substring(oldIndex, s.length()));
    
        return sb.toString();
    }
    

    I hope this helps! (You don't have to give me credit for this, I give it to public domain)