I have a requirement in my project. I have to collect log data using flume and that data has to be fed into hive table.
Here my requirement to collect files placed in a folder into hdfs which I am doing using spooldir. After this I need to process these files and place output in hive folder for data to be queried immediately.
Can I process the source files using sink in such a way that data placed in hdfs is already process into required format.?
Thanks, Sathish
Using below configuration has served my purpose.
source.type = spooldir source.spooldir = ${location}