Search code examples
javascriptphpamazon-web-servicesamazon-s3

Upload via presigned request - 403 forbidden for Unicode Filename


I have just ran into a weird issue with S3 blocking requests (403 Forbidden, no response body) if the filename is constructed from Unicode codepoints.

Backend code that generates the presigned URL:

$userId = ...; // Irrelevant.
$filename = ...; // Comes from POST data.
$safeName = trim(preg_replace('/[^a-z0-9\-_.]/i', '-', $filename), '-'); // AWS only allows specific characters in key.
$key = sprintf('user-documents/%s/%s', $userId, $safeName);

$metadata = [
    'type'     => 'USER_DOCUMENT',
    'userId'   => $userId,
    'filename' => $filename, // The raw one from POST.
];
$s3 = new S3Client([
    'region'      => getenv('AWS_REGION'),
    'version'     => 'latest',
    'credentials' => CredentialProvider::env(),
]);
$uploadUrl = $s3->createPresignedRequest(
    $s3->getCommand('PutObject', [
        'Bucket'   => getenv('AWS_BUCKET_USER_DATA'),
        'Key'      => $key,
        'Metadata' => $metadata,
    ]),
    '+1 hour',
)->getUri();

$response = [
    'uploadUrl' => $uploadUrl,
    'metadata'  => $metadata,
];

Frontend code that uploads the files to S3:

const file = fileInput.files[0];
const response = await getUploadUrl(file.name); // This is where the POST filename comes from.
await fetch(response.uploadUrl, {
    method: 'PUT',
    headers: {
        'x-amz-meta-type': response.metadata.type,
        'x-amz-meta-userid': response.metadata.userId,
        'x-amz-meta-filename': response.metadata.filename
    },
    body: file,
}).then(resp => {
    if (!resp.ok) {
        throw new Error('File upload failed: ' + resp.status + ' ' + resp.statusText)
    }
});

This code works completely fine if the filenames are in ASCII, but if a filename contains a unicode letter, e.g. ä, then the uploadUrl request gives me a 403 Forbidden response without a response body, turning debugging into guessing. I only tried changing ä into a because another StackOverflow question had some answers mentioning filename URL encoding, and that worked.

So the question is - what do I need to change in this code in order to not have this issue? I'm not even sure where the problem is, because uploadUrl contains the URL-encoded original filename, which AWS - presumably - decodes on their end (it's a query parameter, of course it should be URL-decoded, it's their own SDK that encodes it!), and metadata headers also contain the original filename (non-encoded).

aws/[email protected], if that changes anything.


Solution

  • By following information about RFC 2047 from @hakre's answer, I found the following package that encodes strings into the RFC 2047 "Q" encoding, but it depends on another package. IMO this is too much for a few characters in rare cases, but it might be useful for other people, so I'm leaving it here for them. Using this library, one would only need to escape the headers values in the PUT request.

    Instead of adding multiple new dependencies, I have chosen to just convert those characters into ASCII by using this package that I'm already using in backend instead.

    The required changes then look like this:

    $pathinfo = pathinfo($filename);
    $safeName = (new Slugify())->slugify($pathinfo['filename']) . '.' . $pathinfo['extension'];
    ...
    $metadata = [
        ...
        'filename' => $safeName, // Changed from `$filename`.
    ];
    

    Everything else stays the same. This has the unfortunate drawback that the original filenames will not be preserved anywhere (AWS does not allow Unicode characters in key and metadata headers are not being URL-decoded on AWS end), which could be a problem if I was dealing with filenames in Arabic or something like that, but I'm not, so in my case it's close enough (Extended Latin to ASCII).

    It's a damn shame that fetch() throws an error instead of just encoding the headers in Q encoding like they're supposed to be encoded...