Search code examples
amazon-web-servicess3cmd

s4cmd sync two buckets access denied


I am trying to sync two s3 buckets:

s4cmd --dry-run sync s3://cgl-rnaseq-recompute-fixed/gtex s3://rnaseq.toil.20k/gtex

But I am getting the following error:

[Exception] An error occurred (AccessDenied) when calling the ListObjects operation: Access Denied
[Thread Failure] An error occurred (AccessDenied) when calling the ListObjects operation: Access Denied

The source bucket is publicly available. The second bucket is mine and I have access to it:

[centos@ip-172-30-3-12 data]$ s4cmd ls s3://rnaseq.toil.20k/
                 DIR s3://rnaseq.toil.20k/gtex/
                 DIR s3://rnaseq.toil.20k/pnoc/
                 DIR s3://rnaseq.toil.20k/target/
                 DIR s3://rnaseq.toil.20k/tcga/

Also I cannot ls on the source bucket using s4cmd but I can using s3cmd:

[centos@ip-172-30-3-12 data]$ s4cmd ls s3://cgl-rnaseq-recompute-fixed/gtex
[Exception] An error occurred (AccessDenied) when calling the ListObjects operation: Access Denied
[Thread Failure] An error occurred (AccessDenied) when calling the ListObjects operation: Access Denied

[centos@ip-172-30-3-12 data]$ s3cmd ls --requester-pays s3://cgl-rnaseq-recompute-fixed/gtex
                       DIR   s3://cgl-rnaseq-recompute-fixed/gtex/
2016-06-03 17:02    435553   s3://cgl-rnaseq-recompute-fixed/gtex-manifest

What could be going wrong? Any suggestions would be much appreciated.


Solution

  • To achieve the s3cmd behavior, use wildcards:

    s4cmd sync s3://bucket/path/dirA/* s3://bucket/path/dirB/
    

    Note s4cmd doesn't support dirA without trailing slash indicating dirA/* as what rsync supported.

    So in you case you have to use.

    s4cmd --dry-run sync s3://cgl-rnaseq-recompute-fixed/gtex/* s3://rnaseq.toil.20k/gtex
    

    Check this documentation for s4cmd it is very helpful.

    https://github.com/bloomreach/s4cmd