Search code examples
hadoopamazon-s3elastic-map-reduceamazon-emr

Is it possible to run hadoop fs -getmerge in S3?


I have an Elastic Map Reduce job which is writing some files in S3 and I want to concatenate all the files to produce a unique text file.

Currently I'm manually copying the folder with all the files to our HDFS (hadoop fs copyFromLocal), then I'm running hadoop fs -getmerge and hadoop fs copyToLocal to obtain the file.

is there anyway to use hadoop fs directly on S3?


Solution

  • Actually, this response about getmerge is incorrect. getmerge expects a local destination and will not work with S3. It throws an IOException if you try and responds with -getmerge: Wrong FS:.

    Usage:

    hadoop fs [generic options] -getmerge [-nl] <src> <localdst>