Is there a way to copy a list of files from S3 to hdfs instead of complete folder using s3distcp? this is when srcPattern can not work.
I have multiple files on a s3 folder all having different names. I want to copy only specific files to a hdfs directory. I did not find any way to specify multiple source files path to s3distcp.
Workaround that I am currently using is to tell all the file names in srcPattern
hadoop jar s3distcp.jar
--src s3n://bucket/src_folder/
--dest hdfs:///test/output/
--srcPattern '.*somefile.*|.*anotherone.*'
Can this thing work when the number of files is too many? like around 10 000?
Yes you can. create a manifest file with all the files you need and use --copyFromManifest option as mentioned here