Im using Flume 1.5.0 to collect log from Application server. Say i have three App server, App-A, App-B, App-C. One HDFS Server where hive is running. Now flume agents are running on all 3 App server and passing the log message from app servers to Hdfs server,where another flume agent is running and finaaly the logs are stored in hadoop file system. Now I have created an external Hive table to map those log data. But everything is working smoothly except the fact that hive is unable to parse the log data properly and store in table.
Here's my Flume and Hive configuration:
Dummy Log File Format (| separated): ClientId|App Request|URL
Flume conf at App servers:
app-agent.sources = tail
app-agent.channels = memoryChannel
app-agent.sinks = avro-forward-sink
app-agent.sources.tail.type = exec
app-agent.sources.tail.command = tail -F /home/kuntal/practice/testing/application.log
app-agent.sources.tail.channels = memoryChannel
app-agent.channels.memoryChannel.type = memory
app-agent.channels.memoryChannel.capacity = 100000
app-agent.channels.memoryChannel.transactioncapacity = 10000
app-agent.sinks.avro-forward-sink.type = avro
app-agent.sinks.avro-forward-sink.hostname = localhost
app-agent.sinks.avro-forward-sink.port = 10000
app-agent.sinks.avro-forward-sink.channel = memoryChannel
Flume conf at Hdfs server:
hdfs-agent.sources = avro-collect
hdfs-agent.channels = memoryChannel
hdfs-agent.sinks = hdfs-write
hdfs-agent.sources.avro-collect.type = avro
hdfs-agent.sources.avro-collect.bind = localhost
hdfs-agent.sources.avro-collect.port = 10000
hdfs-agent.sources.avro-collect.channels = memoryChannel
hdfs-agent.channels.memoryChannel.type = memory
hdfs-agent.channels.memoryChannel.capacity = 100000
hdfs-agent.channels.memoryChannel.transactioncapacity = 10000
hdfs-agent.sinks.hdfs-write.channel = memoryChannel
hdfs-agent.sinks.hdfs-write.type = hdfs
hdfs-agent.sinks.hdfs-write.hdfs.path = hdfs://localhost:9000/user/flume/tail_table/avro
hdfs-agent.sinks.hdfs-write.rollInterval = 30
Hive external table:
CREATE EXTERNAL TABLE IF NOT EXISTS test(clientId int, itemType string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'
LOCATION '/user/flume/tail_table/avro';
Please suggest me what to do? Do i need to include AvroSerde at hive side?
Missing the following 3 additional settings in the hdfs sink :
hdfs-agent.sinks.hdfs-write.hdfs.fileType = DataStream
hdfs-agent.sinks.hdfs-write.hdfs.writeFormat = Text
hdfs-agent.sinks.hdfs-write.hdfs.rollInterval = 30
Hence data was not properly stored in hdfs and Hive unable to load into table.Now its working fine!