Search code examples
elasticsearchlogstashlogstash-grok

How to parse a csv file which has some field containing seprator (comma) as-values


sample message - 111,222,333,444,555,val1in6th,val2in6th,777

The sixth column contains a value consisting of commas (val1in6th,val2in6th is a sample value of 6th column). When I use a simple csv filter this message is getting converted to 8 fields. I want to be able to tell the filter that val1in6th,val2in6th should be treated as a single value and placed as the value of 6th column (its okay not to have comma between val1in6th and val2in6th when placed as the output as 6th column).


Solution

  • change your plugin, no more the csv one but grok filter - doc here. Then you use a debugger to create a parser for your lines - like this one: https://grokdebug.herokuapp.com/

    For your lines you could use this grok expression:

    %{WORD:FIELD1},%{WORD:FIELD2},%{WORD:FIELD3},%{WORD:FIELD4},%{WORD:FIELD5},%{GREEDYDATA:FIELD6}
    

    or :

    %{INT:FIELD1},%{INT:FIELD2},%{INT:FIELD3},%{INT:FIELD4},%{INT:FIELD5},%{GREEDYDATA:FIELD6}
    

    It changes the datatypes in elastic of the firsts 5 fields.

    To know about parse csv with grok filter in elastic you could use this es official blog guide, it is explained how to use grok with ingestion pipeline, but it is the same with logstash