I have a mapreduce job, where the file input path is: /basedirectory/*/*.txt
Inside the basedirectory, I have different subfolders (CaseA, CaseB etc), each of which contain hdfs text files.
In the map phase of the job, I want to find out where exactly the data shard came from (e.g. CaseA). How can I achieve that?
I've done something similar for mapreduce jobs with more than 1 input hbase tables where I use context.getInputSplit().getTableName() to find the actual table name but not sure what to do for HDFS input files.
You can get input split using context.getInputSplit()
(where context
is mapper.context
) and then use .getPath()
method on the inputSplit
to return the file path.