Search code examples
hadoop-streamingelastic-map-reduce

AWS Elastic MapReduce Streaming. Use data from nested folders as input


I have data located in structure s3n://bucket/{date}/{file}.gz with > 100 folders. How to setup streaming job and use all of them as input? Specifying s3n://bucket/ didn't help since nodes are folders.


Solution

  • Specify s3n://bucket/*/ as input and it should work fine.