We have two types of logs:
1) SESSION LOG: SESSION_ID, USER_ID, START_DATE_TIME, END_DATE_TIME
2) EVENT LOG: SESSION_ID, DATE_TIME, X, Y, Z
We only need to store the event log, but would like to replace the SESSION_ID with its corresponding USER_ID. Which technologies (i.e. Flume?) should we use to store the data in HDFS?
Thanks!
Yes Flume can be used to move log files to HDFS.
To replace SESSION_ID with USER_ID - you could:
Do this using Shell Scripts - and generate 'Modified Event Log File' - This is what Flume will pick up. This would be the simplest approach.