Search code examples
javaregexactionscript-3replacexml-entities

Java RegEx: Replace all xml characters with their entity number


I am trying to port a function I wrote in ActionScript to Java and I am having a bit of trouble. I have included the function below. I found this response to question #375420, but do I really need to write a separate class? Thanks.

public static function replaceXML(str:String):String {
  return str.replace(/[\"'&<>]/g, function($0:String):String {
    return StringUtil.substitute('&#{0};', $0.charCodeAt(0));
  });
}

Input

<root><child id="foo">Bar</child></root>

Output

&#60;root&#62;&#60;child id=&#34;foo&#34;&#62;Bar&#60;/child&#62;&#60;/root&#62;

UPDATE

Here is my solution if anyone is wondering. Thanks Sri Harsha Chilakapati.

public static String replaceXML(final String inputStr) {
  String outputStr = inputStr;
  Matcher m = Pattern.compile("[&<>'\"]").matcher(outputStr);
  String found = "";
  while (m.find()) {
    found = m.group();
    outputStr = outputStr.replaceAll(found,
      String.format("&#%d;", (int)found.charAt(0)));
  }
  return outputStr;
}

Solution

  • You can use regex for that.

    import java.util.regex.Matcher;
    import java.util.regex.Pattern;
    
    String myString = "<root><child id=\"foo\">Bar</child></root>";
    
    Matcher m = Pattern.compile("[^\\p{L}\\p{N};\"+*/-]").matcher(myString);
    
    while (m.find()) {
        String found = m.group();
        myString = myString.replaceAll(found, "&#" + (int)found.charAt(0) + ";");
    }
    
    System.out.println(myString);
    

    It's working.

    Output is

    &#60;root&#62;&#60;child&#32;id&#61;"foo"&#62;Bar&#60;/child&#62;&60;/root&#62;