Search code examples
javaencodingcharacter-encodingasciiurl-encoding

Encode URL with US-ASCII character set


I refer to the following web site:

http://coderstoolbox.net/string/#!encoding=xml&action=encode&charset=us_ascii

Choosing "URL", "Encode", and "US-ASCII", the input is converted to the desired output.

How do I produce the same output with Java codes?

Thanks in advance.


Solution

  • I used this and it seems to work fine.

    public static String encode(String input) {
        Pattern doNotReplace = Pattern.compile("[a-zA-Z0-9]");
        
        return input.chars().mapToObj(c->{
            if(!doNotReplace.matcher(String.valueOf((char)c)).matches()){
                return "%" + (c<256?Integer.toHexString(c):"u"+Integer.toHexString(c));
            }
            return String.valueOf((char)c);
        }).collect(Collectors.joining("")).toUpperCase();
    }
    

    PS: I'm using 256 to limit the placement of the prefix U to non-ASCII characters. No need of prefix U for standard ASCII characters which are within 256.


    Alternate option:

    There is a built-in Java class (java.net.URLEncoder) that does URL Encoding. But it works a little differently (For example, it does not replace the Space character with %20, but replaces with a + instead. Something similar happens with other characters too). See if it helps:

    String encoded = URLEncoder.encode(input, "US-ASCII");
    

    Hope this helps!