I have a requirement to interpret and parse a log file whose contents are delimited by comma. The contents of the log file look like below. There is no header column in the file. I am putting it for our convenience.
Event_TimeStamp, Target_IP, UserName, Source_IP, DatabaseName, Activity, QueryText, Flag
202101xxxx, 10.2.x.x, u1, 172.5.x.x, DB1, Connect, , 0
202101xxxx, 10.2.x.x, u1, 172.5.x.x, DB1, Query, 'select * from table1', 0
202101xxxx, 10.2.x.x, u1, 172.5.x.x, DB1, Query, 'select * from table2', 0
202101xxxx, 10.2.x.x, u1, 172.5.x.x, DB1, Query, 'select colname1,colname2,colname3,colname4 from table1', 0
202101xxxx, 10.2.x.x, u1, 172.5.x.x, DB1, Query, 'select colname1,\'some Stringliteral\' as colname2,colname3,colname4 from table2', 0
202101xxxx, 10.2.x.x, u1, 172.5.x.x, DB1, DisConnect, , 0
I have used the convertRecord
and configured CSVRecordReader
and CSVSetRecordWriter
to read/interpret the data and write the data to flow file respectively. The schema for CSV Read is as below.
{
"type": "record",
"namespace": "nifi",
"name": "db",
"fields": [
{ "name": "Event_Timestamp", "type": "string" },
{ "name": "SourceIP", "type": "string" },
{ "name": "UserName", "type": "string" },
{ "name": "TargetIP", "type": "string" },
{ "name": "DatabaseName", "type": "string" },
{ "name": "Activity", "type": "string" },
{ "name": "QueryText", "type": "string" },
{ "name": "Flag", "type": "int" },
]
}
The nifi flow is failing in reading the data because the delimiter configured while setting up CSVRecordReader
is ","(comma) and the QueryText also contains comma within the text. If I only put the first 3 lines in the new log file and run the flow, then the job is successful since the Querytext doesn't contain any commas within.
I do not know if the CSVReader is the right processor to use here. Can anyone help me how to read a log file that is comma-separated but, one or a few fields has comma's within the text. Any help is highly appreciated.
QueryText field values are in single quote ('
) so configure your CSVRecordReader property Quote Character = '
default is "
Also set property CSV Parser = Jackson CSV