I'm trying to download some URLs using wget. I get files with no problem except for this link Offensive-Security-ICQ and any other link on www.offensive-security.com.
I tried on both Linux and Windows with many trials and alot of search, but in vain.
I use this command "wget https://www.offensive-security.com/pwbonline/icq.html"
The resulted file shows this symbols and it is ANSI decoded
How can I solve this problem??
For some reason, the server does not return the html page but a zipped version of it. The file you get is identified as a gzip compressed data:
$ file icq.html
icq.html: gzip compressed data, from Unix
So you can simply unzip it and you get the correct html page.
Why is the server doing that: not sure, but it's probably some default setting that has been left as is, so you can download faster.
How can one directly donwload the html content: probably by sending some common user agent and header, so that the server thinks that its a common web browser doing the request instead of a download tool.
This can be done with wget
using some options, for example, this should work:
wget --header="Accept: text/html" --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:21.0) Gecko/20100101 Firefox/21.0" https://www.offensive-security.com/pwbonline/icq.html