Search code examples
javaregexcharacter-properties

Java Unicode Regular Expression


I have some text like this.

Every person haveue280 sumue340 ambition

I want to replace ue280, ue340 to \ue280, \ue340 with regular expression

Is there any solution

Thanks in advance


Solution

  • Something like this?

    String s = "Every person haveue280 sumue340 ambition";
    
    // Put a backslash in front of all all "u" followed by 4 hexadecimal digits
    s = s.replaceAll("u\\p{XDigit}{4}", "\\\\$0");
    

    which results in

    Every person have\ue280 sum\ue340 ambition
    

    Not sure what you're after, but perhaps it's something like this:

    static String toUnicode(String s) {
        Matcher m = Pattern.compile("u(\\p{XDigit}{4})").matcher(s);
        StringBuffer buf = new StringBuffer();
        while(m.find())
            m.appendReplacement(buf, "" + (char) Integer.parseInt(m.group(1), 16));
        m.appendTail(buf);
        return buf.toString();
    }
    

    (Updated according to axtavt very nice alternative. Making CW.)