Search code examples
amazon-web-servicesamazon-s3wget

Checking if AWS S3 presigned link exists using wget --spider


I've read several threads on SO about checking whether a URL exists or not in bash, e.g. #37345831, and the recommended solution was to use wget with --spider. However, the --spider option appears to fail when used with AWS S3 presigned URLs.

Calling:

wget -S --spider "${URL}" 2>&1

Results in:

HTTP request sent, awaiting response...
  HTTP/1.1 403 Forbidden
  x-amz-request-id: [REF]
  x-amz-id-2: [REF]
  Content-Type: application/xml
  Date: [DATE]
Server: AmazonS3
Remote file does not exist -- broken link!!!

Whereas the following returns as expected, HTTP/1.1 200 OK, for the same input URL:

wget -S "${URL}" -O /dev/stdout | head

The version of wget I'm running is:

GNU Wget 1.20.3 built on linux-gnu.

Any clue as to what's going on?


Solution

  • Any clue as to what's going on?

    There exist few HTTP request methods also known as HTTP verbs, for this case 2 of them are relevant

    • GET
    • HEAD

    when not instructed otherwise wget does make first of them, when --spider option is used second one is used, to which server should respond with just headers (no body).

    AWS S3 presigned link

    According to Signing and authenticating REST requests - Amazon Simple Storage Service one of step of preparing is as follows

    StringToSign = HTTP-Verb + "\n" +
        Content-MD5 + "\n" +
        Content-Type + "\n" +
        Date + "\n" +
        CanonicalizedAmzHeaders +
        CanonicalizedResource;
    

    therefore we might conclude that AWS S3 presigned link will be working with exactly 1 of HTTP verbs. One you have is for GET. Consult whoever crafted that link to furnish you with AWS S3 presigned link made for HEAD if you wish to use --spider successfully.