Search code examples
javagzipinputstream

How to read compressed HTML page with Content-Encoding : gzip


I request a web page that sends a Content-Encoding: gzip header, but got stuck how to read it..

My code:

    try {
        URLConnection connection = new URL("http://jquery.org").openConnection();                        
        String html = "";
        BufferedReader in = null;
        connection.setReadTimeout(10000);
    in = new BufferedReader(new InputStreamReader(connection.getInputStream()));            
    String inputLine;
    while ((inputLine = in.readLine()) != null){
    html+=inputLine+"\n";
        }
    in.close();
        System.out.println(html);
        System.exit(0);
    } catch (IOException ex) {
        Logger.getLogger(Crawler.class.getName()).log(Level.SEVERE, null, ex);
    }

The output looks very messy.. (I was unable to paste it here, a sort of symbols..)

I believe this is a compressed content, how to parse it?

Note:
If I change jquery.org to jquery.com (which don't send that header, my code works well)


Solution

  • There is a class for this: GZIPInputStream. It is an InputStream and so is very transparent to use.