I am trying to upload a 9GB .tar.gz file which I created using the CPanel Backup wizard. This file should be stored as is on Amazon Glacier but Amazon Glacier has a upload limit of 4GB.
Is there a way to do this using PHP, aws-SDK v2 and uploadMultipartPart?
This is the code I got so far:
<?php
require 'aws-autoloader.php';
use Aws\Glacier\GlacierClient;
use Aws\Glacier\Model\MultipartUpload\UploadPartGenerator;
//#####################################################################
//SET AMAZON GLACIER VARIBALES
//#####################################################################
$key = 'XXXXXXXXXXXXXXXXX';
$secret = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX';
$region = 'us-west-2';
$accountId = 'XXXXXXXXXXXX';
$vaultName = 'XXXXXXXXXXXX';
$partSize = 4 * 1024 * 1024;
$fileLocation = 'path/to/.tar.gz file/';
//#####################################################################
//DECLARE THE AMAZON CLIENT
//#####################################################################
$client = GlacierClient::factory(array(
'key' => $key,
'secret' => $secret,
'region' => $region,
));
//#####################################################################
//GET ALL FILES INTO AN ARRAY
//#####################################################################
$files = scandir($fileLocation);
$filename = $files[2];
//#####################################################################
// USE HELPERS IN THE SDK TO GET INFORMATION ABOUT EACH OF THE PARTS
//#####################################################################
$archiveData = fopen($fileLocation.$filename, 'r');
$parts = UploadPartGenerator::factory($archiveData, $partSize);
//#####################################################################
// INITIATE THE UPLOAD AND GET THE UPLOAD ID
//#####################################################################
$result = $client->initiateMultipartUpload(array(
'vaultName' =>$vaultName,
'partSize' => $partSize,
));
$uploadId = $result->get('uploadId');
//#####################################################################
// UPLOAD EACH PART INDIVIDUALLY USING DATA FROM THE PART GENERATOR
//#####################################################################
$archiveData = fopen($fileLocation.$filename, 'r');
foreach ($parts as $part) {
set_time_limit (120);
fseek($archiveData, $part->getOffset());
$client->uploadMultipartPart(array(
'vaultName' => $vaultName,
'uploadId' => $uploadId,
'body' => fread($archiveData, $part->getSize()),
'range' => $part->getFormattedRange(),
'checksum' => $part->getChecksum(),
'ContentSHA256' => $part->getContentHash(),
));
}
//#####################################################################
// COMPLETE THE UPLOAD BY USING DATA AGGREGATED BY THE PART GENERATOR
//#####################################################################
$result = $client->completeMultipartUpload(array(
'vaultName' => $vaultName,
'uploadId' => $uploadId,
'archiveSize' => $parts->getArchiveSize(),
'checksum' => $parts->getRootChecksum(),
));
$archiveId = $result->get('archiveId');
fclose($archiveData);
?>
Note partSize
needs to be n * 1024 * 1024, where n is a power of 2. You're using 104857600 = 100 * 1024 * 1024. Your n is an even number, not a power of two. http://docs.aws.amazon.com/amazonglacier/latest/dev/api-multipart-initiate-upload.html
I don't have a complete answer, but you could specify what error you are getting.
Also from the docs: "The minimum allowable part size is 1 MB, and the maximum is 4 GB (4096 MB)." In other words, n >=1, n <= 4096, and n is a power of 2. So what's a good number to use? I think the idea is use a smaller n if you have problems, subject to these constraints:
You pay per part: $0.050 per 1,000 requests in US-East.
There's a maximum number of parts: 10,000. For your 9 GB upload, that works out to a part size of 966367 ~ 0.9 MB if you use the max number of parts. So 0.9 MB is the min part size for 9 GB. You are right to want to use a larger part size than 1 MB to be comfortably within the limits.
There's a reason not to use overly large part sizes. It has something to do with memory, CPU and saturating your internet connection. All I can really say is that the software I use defaults to 16 MB. Here is a discussion of the tradeoffs on its issues tracker: https://github.com/vsespb/mt-aws-glacier/issues/55