Search code examples
xmlcurldownloadwgetcron-task

Can't download XML file


I'm trying to download XML file from the remote URL without success. I can see its content in the web browser, but can't download it through command line ( I can download it manually save as from the web browser ). I'm using wget:

wget -q -O test.xml https://example.com/test

I tried also using cURL without success.

Any idea?


Solution

  • Remove -q and you'll see:

    --2017-04-20 14:25:53--  https://example.com/test
    Resolving example.com... 93.184.216.34, 2606:2800:220:1:248:1893:25c8:1946
    Connecting to example.com|93.184.216.34|:443... connected.
    HTTP request sent, awaiting response... 404 Not Found
    2017-04-20 14:25:53 ERROR 404: Not Found.
    

    The URL is a 404 error page. Consequently text.xml is empty.

    Then if you look at the manual:

       --content-on-error
           If this is set to on, wget will not skip the content when the
           server responds with a http status code that indicates error.
    

    So:

    wget -q --content-on-error -O test.xml https://example.com/test
    

    … successfully downloads that resource.

    It isn't valid XML though. The HTML 5 Doctype breaks it.