Search code examples
amazon-web-servicesamazon-s3opendata

How can I download an AWS open data set to my machine?


I would like to use AWS CLI to download a publicly available data set to my machine, but as the Registry of Open Data only provides an Amazon Resource Name (ARN) and no URL, I do not know how to do it.


Solution

  • For example, for Therapeutically Applicable Research to Generate Effective Treatments (TARGET) the arn is:

    arn:aws:s3:::gdc-target-phs000218-2-open

    Thus, the bucket name is

    gdc-target-phs000218-2-open

    To list it:

    aws s3 ls s3://gdc-target-phs000218-2-open
    

    To copy it to your local folder (large dataset can take long time):

    aws s3 sync s3://gdc-target-phs000218-2-open .
    

    or

    aws s3 cp s3://gdc-target-phs000218-2-open . --recursive