I have a Map-only job that processes a large text file. Each line is analyzed and categorized. MultipleOutputs are used to output each category into separate files. Eventually all the data gets added to a Hive table dedicated to each category. My current workflow does the job but is a bit cumbersome. I am going to add a couple of categories, and thought I might be able to streamline line the process. I have a couple of ideas and was looking for some input.
Current Workflow:
Possible new workflows
For MultipleOutputs set output path to base folder where your hive external tables located.
Then write data into "<table_name>/<filename_prefix>"
.
And your data will be located in your target tables.