Search code examples
hdfsapache-nififilebeatelastic-beats

ingesting nifi to hdfs to a single directory


Scenario

CSV data named test_csv.csv from windows. Ingesting CSV data to hdfs. Beats > (ListenBeats) NiFi (PutHDFS) > HDFS

data sample:

a,b,c,d,e
a1,b1,c1,d1,e1
a2,b2,c2,d2,e2
a3,b3,c3,d3,e3
a4,b4,c4,d4,e4
a5,b5,c5,d5,e5
a6,b6,c6,d6,e6
a7,b7,c7,d7,e7
a8,b8,c8,d8,e8

according to Nifi Flow UI it works fine and successfully written into hdfs. Problem is

hadoop@ambari:~$ hdfs dfs -ls /user/nifi/test
Found 9 items
-rw-r--r--   3 nifi hdfs        480 2020-07-06 14:30 /user/nifi/test/0192a8bb-67ec-462e-a602-62a5425afc99
-rw-r--r--   3 nifi hdfs        480 2020-07-06 14:30 /user/nifi/test/0211ec05-fc62-4b82-87e5-a2e20a9fb07e
-rw-r--r--   3 nifi hdfs        481 2020-07-06 14:30 /user/nifi/test/1e227df9-f49f-46d6-a309-25e466fa14cf
-rw-r--r--   3 nifi hdfs        480 2020-07-06 14:30 /user/nifi/test/324a0c0e-e190-4239-b594-edbf9fcab0d6
-rw-r--r--   3 nifi hdfs        474 2020-07-06 14:30 /user/nifi/test/3d34827b-6bae-4c21-981e-9722b7a6703e
-rw-r--r--   3 nifi hdfs        481 2020-07-06 14:30 /user/nifi/test/6873c51b-a93b-4872-b33c-0e59b85afcd5
-rw-r--r--   3 nifi hdfs        480 2020-07-06 14:30 /user/nifi/test/98606d6b-2206-4b2e-8204-8363a87f41d0
-rw-r--r--   3 nifi hdfs        480 2020-07-06 14:30 /user/nifi/test/f25e56b5-88d7-4135-b475-213e4e54b47f
-rw-r--r--   3 nifi hdfs        480 2020-07-06 14:30 /user/nifi/test/f354f587-8da2-418f-be0d-34e8a79d7d39

i've tried to change PutHDFS directory into /user/nifi/test.csv it returns

hadoop@ambari:~$ hdfs dfs -cat /user/nifi/test.csv
cat: `/user/nifi/test.csv': Is a directory
hadoop@ambari:~$ hdfs dfs -ls /user/nifi/test.csv
Found 9 items
-rw-r--r--   3 nifi hdfs        480 2020-07-06 14:35 /user/nifi/test.csv/02cdc89d-3cb9-494a-b7f5-d280d7b7c65e
-rw-r--r--   3 nifi hdfs        480 2020-07-06 14:35 /user/nifi/test.csv/2476906a-00d9-463a-89ef-ea885f823faa
-rw-r--r--   3 nifi hdfs        474 2020-07-06 14:35 /user/nifi/test.csv/5b9a9d7e-0c2f-428c-8af4-e875c6db1a04
-rw-r--r--   3 nifi hdfs        480 2020-07-06 14:35 /user/nifi/test.csv/66017da5-b55f-437b-a3cf-0a6b45d86ce8
-rw-r--r--   3 nifi hdfs        480 2020-07-06 14:35 /user/nifi/test.csv/7be93660-75a1-416b-b019-656d466813d6
-rw-r--r--   3 nifi hdfs        480 2020-07-06 14:35 /user/nifi/test.csv/98877296-126c-4ac9-9da5-cef62937e9f9
-rw-r--r--   3 nifi hdfs        481 2020-07-06 14:35 /user/nifi/test.csv/ac075d33-1137-4aea-9e5b-fc11097558eb
-rw-r--r--   3 nifi hdfs        480 2020-07-06 14:35 /user/nifi/test.csv/b9b44c08-1bc6-4e33-947b-daf265491181
-rw-r--r--   3 nifi hdfs        481 2020-07-06 14:35 /user/nifi/test.csv/ba6464db-ef64-4993-a070-80f1392eac1e

is it possible to make nifi write to hdfs in a single directory file? i was expecing that it will create test.csv file in hdfs

Thank you


Solution

  • Every flow file in NiFi has an attribute named "filename" and that is what PutHDFS is using as the filename in HDFS. The "Directory" property in PutHDFS is only for the directory, so you want to put only "/user/nifi".

    In order to change the filename, you would put an UpdateAttribute processor right before PutHDFS, and set filename = whatever-you-want.csv

    If you set it to a static value then every time it writes there is going to be an existing file and be in conflict, either replace or throw an error. So you probably want to use a MergeContent/MergeRecord processor first to batch together many small CSV entries into a larger flow file, and then create a dynamic filename like:

    filename = test-${now()}.csv

    You can use a different expression, but just something unique like a timestamp, date string, or UUID.