Search code examples
phpamazon-web-servicescpanelaws-sdkamazon-glacier

How to upload a .tar.gz file into Amazon Glacier using PHP and aws-SDK v2 with multipart upload?


I am trying to upload a 9GB .tar.gz file which I created using the CPanel Backup wizard. This file should be stored as is on Amazon Glacier but Amazon Glacier has a upload limit of 4GB.

Is there a way to do this using PHP, aws-SDK v2 and uploadMultipartPart?

This is the code I got so far:

<?php    
require 'aws-autoloader.php';

use Aws\Glacier\GlacierClient;
use Aws\Glacier\Model\MultipartUpload\UploadPartGenerator;

//#####################################################################
//SET AMAZON GLACIER VARIBALES
//#####################################################################
$key = 'XXXXXXXXXXXXXXXXX';
$secret = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX';
$region = 'us-west-2';
$accountId = 'XXXXXXXXXXXX';
$vaultName = 'XXXXXXXXXXXX';
$partSize = 4 * 1024 * 1024;
$fileLocation = 'path/to/.tar.gz file/';

//#####################################################################
//DECLARE THE AMAZON CLIENT
//#####################################################################
$client = GlacierClient::factory(array(
    'key'    => $key,
    'secret' => $secret,
    'region' => $region,
));

//#####################################################################
//GET ALL FILES INTO AN ARRAY
//#####################################################################
$files = scandir($fileLocation);
$filename = $files[2];

//#####################################################################
// USE HELPERS IN THE SDK TO GET INFORMATION ABOUT EACH OF THE PARTS
//#####################################################################
$archiveData = fopen($fileLocation.$filename, 'r');
$parts = UploadPartGenerator::factory($archiveData, $partSize);

//#####################################################################
// INITIATE THE UPLOAD AND GET THE UPLOAD ID
//#####################################################################
$result = $client->initiateMultipartUpload(array(
    'vaultName' =>$vaultName,
    'partSize'  => $partSize,
));
$uploadId = $result->get('uploadId');

//#####################################################################
// UPLOAD EACH PART INDIVIDUALLY USING DATA FROM THE PART GENERATOR
//#####################################################################
$archiveData = fopen($fileLocation.$filename, 'r');
foreach ($parts as $part) {
    set_time_limit (120);
    fseek($archiveData, $part->getOffset());
    $client->uploadMultipartPart(array(
        'vaultName'     => $vaultName,
        'uploadId'      => $uploadId,
        'body'          => fread($archiveData, $part->getSize()),
        'range'         => $part->getFormattedRange(),
        'checksum'      => $part->getChecksum(),
        'ContentSHA256' => $part->getContentHash(),
    ));
}

//#####################################################################
// COMPLETE THE UPLOAD BY USING DATA AGGREGATED BY THE PART GENERATOR
//#####################################################################
$result = $client->completeMultipartUpload(array(
    'vaultName'   => $vaultName,
    'uploadId'    => $uploadId,
    'archiveSize' => $parts->getArchiveSize(),
    'checksum'    => $parts->getRootChecksum(),
));
$archiveId = $result->get('archiveId');

fclose($archiveData);
?>

Solution

  • Note partSize needs to be n * 1024 * 1024, where n is a power of 2. You're using 104857600 = 100 * 1024 * 1024. Your n is an even number, not a power of two. http://docs.aws.amazon.com/amazonglacier/latest/dev/api-multipart-initiate-upload.html

    I don't have a complete answer, but you could specify what error you are getting.

    Also from the docs: "The minimum allowable part size is 1 MB, and the maximum is 4 GB (4096 MB)." In other words, n >=1, n <= 4096, and n is a power of 2. So what's a good number to use? I think the idea is use a smaller n if you have problems, subject to these constraints:

    • You pay per part: $0.050 per 1,000 requests in US-East.

    • There's a maximum number of parts: 10,000. For your 9 GB upload, that works out to a part size of 966367 ~ 0.9 MB if you use the max number of parts. So 0.9 MB is the min part size for 9 GB. You are right to want to use a larger part size than 1 MB to be comfortably within the limits.

    • There's a reason not to use overly large part sizes. It has something to do with memory, CPU and saturating your internet connection. All I can really say is that the software I use defaults to 16 MB. Here is a discussion of the tradeoffs on its issues tracker: https://github.com/vsespb/mt-aws-glacier/issues/55