Search code examples
http-headerswgetcorruption

wget save-headers and file corruption?


Running wget --save-headers leaves the response headers at the top of the downloaded file. However it seems that any file downloaded with this parameter is corrupt, even if removing the headers.

$ wget svnpenn.github.io/img/2012/git.jpg

$ wget --save-headers -O- svnpenn.github.io/img/2012/git.jpg | sed '1,/^$/d' > git2.jpg

$ ls -l
total 136
-rw-r--r--+ 1 Steven None 65755 Jul  4 21:58 git.jpg
-rw-r--r--+ 1 Steven None 65753 Jul  7 11:35 git2.jpg

git2


Solution

  • It looks like it's the sed command that is doing it. If it edit the file with a hex edit it works fine. The following command works for me.

    $ wget --save-headers svnpenn.github.io/img/2012/git.jpg
    HTTP request sent, awaiting response... 200 OK
    Length: 65755 (64K) [image/jpeg]
    
    $ sed -b '1,/^\r/d' git.jpg > good.jpg
    
    $ ls -l
    total 136
    -rw-r--r--+ 1 Steven None 66044 Jul  8 18:17 git.jpg
    -rw-r--r--+ 1 Steven None 65755 Jul  9 17:53 good.jpg