Search code examples
amazon-web-servicesamazon-s3downloaddataset

Download data sets from Amazon


I want to know if it is possible to download a portion of a public AWS data set and how to do it.

Specifically, I want to download a part from Common Crawl Corpus to do local tests.


Solution

  • It looks like you can. If you point your browser to the public URL provided by Amazon, you can see links for the whole sets and also for subsets.

    You can download it using your browser or any S3 client tools or libraries.