I'm trying to put a CSV file into HDFS using flume, file contains some unicode characters also.
Once the file is there in HDFS I tried to view the content, but unable to see the records properly.
File content
Name age sal msg
Abc 21 1200 Lukè éxample àpple
Xyz 23 1400 er stîget ûf mit grôzer
Output in console
I did hdfs dfs -get /flume/events/csv/events.1234567
Below is the output
Name,age,sal,msg
Abc,21,1200,Luk��xample��pple
Xyz,23,1400,er st�get �f mit gr�zer
Does flume supports Unicode characters? If not how it can be handled
Yes Flume does support Unicode character. You can read your Unicode file using flume and transfer data to HDFS. This looks like some other issue.Change hdfs.fileType to DataStream and see if you can properly read output.
a1.sources = r1
a1.channels = c1
a1.sinks = k1
#source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /root/user/shashi/unicode/french.txt
a1.sources.r1.restart = true
#sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /flume/events/
a1.sinks.k1.hdfs.filePrefix = events-
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.fileType = DataStream
#channel
a1.channels.c1.type = memory
#connect
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
Here is a smaple configuration that i have used.