Search code examples
csvhadoopunicodeflume

Unicode character with flume


I'm trying to put a CSV file into HDFS using flume, file contains some unicode characters also.

Once the file is there in HDFS I tried to view the content, but unable to see the records properly.

File content

Name    age  sal    msg

Abc     21  1200    Lukè éxample àpple

Xyz     23  1400    er stîget ûf mit grôzer

Output in console

I did hdfs dfs -get /flume/events/csv/events.1234567

Below is the output

Name,age,sal,msg

Abc,21,1200,Luk��xample��pple

Xyz,23,1400,er st�get �f mit gr�zer

Does flume supports Unicode characters? If not how it can be handled


Solution

  • Yes Flume does support Unicode character. You can read your Unicode file using flume and transfer data to HDFS. This looks like some other issue.Change hdfs.fileType to DataStream and see if you can properly read output.

    a1.sources = r1
    a1.channels = c1
    a1.sinks = k1
    
    #source
    a1.sources.r1.type = exec
    a1.sources.r1.command = tail -F /root/user/shashi/unicode/french.txt
    a1.sources.r1.restart = true
    
    #sink
    
    a1.sinks.k1.type = hdfs
    a1.sinks.k1.hdfs.path = /flume/events/
    a1.sinks.k1.hdfs.filePrefix = events-
    a1.sinks.k1.hdfs.round = true
    a1.sinks.k1.hdfs.fileType = DataStream
    #channel
    a1.channels.c1.type = memory
    
    #connect
    a1.sources.r1.channels = c1
    a1.sinks.k1.channel = c1
    

    Here is a smaple configuration that i have used.