Search code examples
javaurlurl-encoding

Remove hexadecimal characters from URL


Please deal with this trivial question.

I am getting some URLs like "SOME_DOMAIN?q\x3dnintendo+mathe\x26um\x3d1\x26ie\x3dUTF-8\x26tbm\x3dshop\x26cid\x3d8123694338777545283\x26sa\x3dX\x26ei\x3dL8cjUJmHO8L30gGa1ICgCw\x26ved\x3d0CI4BEIIIMAk" which contains some escape characters.

What is the best way to remove these hexadecimal characters? I have this below snippet which solves my problem as of now but doesn't look like a reliable solution.

    url = url.replace("\\x2F","/");
    url = url.replace("\\x26","&");
    url = url.replace("\\x3d","=");
    url = url.replace("\\x2F","/");
    url = url.replace("\\x2F","/");

I haven't faced this issue but spaces might appear between the URL. Should URLDecoder.decode solve my problem?

Kindly advice.

Thanks


Solution

  • This works

       URLDecoder.decode(yourURLString.replace("\\x", "%"), "UTF-8")
    

    see this in action :)

    public static void main(String[] args) throws UnsupportedEncodingException {
        String s = "SOME_DOMAIN?q\\x3dnintendo+mathe\\x26um\\x3d1\\x26ie\\x3dUTF-8\\x26tbm\\x3dshop\\x26cid\\x3d8123694338777545283\\x26sa\\x3dX\\x26ei\\x3dL8cjUJmHO8L30gGa1ICgCw\\x26ved\\x3d0CI4BEIIIMAk";
        System.out.println(URLDecoder.decode(s.replace("\\x", "%"), "UTF-8"));
    
    }
    

    returns

    SOME_DOMAIN?q=nintendo mathe&um=1&ie=UTF-8&tbm=shop&cid=8123694338777545283&sa=X&ei=L8cjUJmHO8L30gGa1ICgCw&ved=0CI4BEIIIMAk
    

    Basically, you need to replace \x with % and decode it using:

     URLDecoder.decode(url, "UTF-8");
    

    see here

    http://docs.oracle.com/javase/1.5.0/docs/api/java/net/URLDecoder.html#decode%28java.lang.String,%20java.lang.String%29