Search code examples
rubycsvlogstashfilebeat

How to resolve parsing error for CSV file in Logstash


I am using Filebeat to send a CSV file to Logstash and then up to Kibana, however I am getting a parsing error when the CSV file is picked up by Logstash.

This is the contents of the CSV file:

time    version id  score   type

May 6, 2020 @ 11:29:59.863  1 2 PPy_6XEBuZH417wO9uVe  _doc

The logstash.conf:

input {
  beats {
    port => 5044
  }
}
filter {
  csv {
      separator => ","
      columns =>["time","version","id","index","score","type"]
      }
}
output {
  elasticsearch {
    hosts => ["http://localhost:9200"]
    index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
  }
}

Filebeat.yml:

filebeat.inputs:

# Each - is an input. Most options can be set at the input level, so
# you can use different inputs for various configurations.
# Below are the input specific configurations.

- type: log

  # Change to true to enable this input configuration.
  enabled: true

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /etc/test/*.csv
    #- c:\programdata\elasticsearch\logs\*

and the error in Logstash:

[2020-05-27T12:28:14,585][WARN ][logstash.filters.csv     ][main] Error parsing csv {:field=>"message", :source=>"time,version,id,score,type,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,", :exception=>#<TypeError: wrong argument type String (expected LogStash::Timestamp)>}
[2020-05-27T12:28:14,586][WARN ][logstash.filters.csv     ][main] Error parsing csv {:field=>"message", :source=>"\"May 6, 2020 @ 11:29:59.863\",1,2,PPy_6XEBuZH417wO9uVe,_doc,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,", :exception=>#<TypeError: wrong argument type String (expected LogStash::Timestamp)>}

I do get some data in Kibana but not what I want to see.

enter image description here


Solution

  • I have managed to get it to work locally. the mistakes I have noticed so far were:

    1. Using ES reserved fields like @timestamp, @version, and more.
    2. The timestamp was not in ISO8601 format. It had an @ sign in the middle.
    3. Your filter set the separator to , but your CSV real separator is "\t".
    4. According to the error you can see it is trying to also work on your titles line, I suggest you remove it from the CSV or use the skip_header option.

    Below is the logstash.conf file I used:

    input {
        file {
            path => "C:/work/elastic/logstash-6.5.0/config/test.csv"
            start_position => "beginning"
        } 
    }
    filter { 
        csv { 
            separator => ","
            columns =>["time","version","id","score","type"]
        } 
    } 
    output { 
        elasticsearch { 
            hosts => ["localhost:9200"]
            index => "csv-test" 
        } 
    }
    

    The CSV file I used:

    May 6 2020 11:29:59.863,1,PPy_6XEBuZH417wO9uVe,_doc
    May 6 2020 11:29:59.863,1,PPy_6XEBuZH417wO9uVe,_doc
    May 6 2020 11:29:59.863,1,PPy_6XEBuZH417wO9uVe,_doc
    May 6 2020 11:29:59.863,1,PPy_6XEBuZH417wO9uVe,_doc
    

    From my Kibana:

    enter image description here