Search code examples
elasticsearchlogstash

Logstash Date Parse Error using CSV file input


Hoping someone can assist me with my issue below:

I have Logstash conf setup to use the csv input plugin. The data inputs a date field with value like follows…

2024-01-09 22:21:04

I then have this logic in the filter plugin to handle the date…

  date {
    match => ["cart_received_timestamp", "yyyy-MM-dd HH:mm:ss", "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"]
    target => "@timestamp"
  }

I am getting the following error (I’m using strict in my index to reject invalid data). The input date-

2024-01-09 22:21:04

… and the index expectation-

yyyy-MM-dd HH:mm:ss||yyyy-MM-dd'T'HH:mm:ss.SSS'Z'

… are the same.

It seems like logstash converts my date to the format below:

'2024-01-09T22:21:04.567414642Z'

… causing the error as it does not match the index mapping for this field’s required format.

{"update"=>{"status"=>400, "error"=>{"type"=>"document_parsing_exception", "reason"=>"[1:539] failed to parse field [@timestamp] of type [date] in document with id 'punchout_cart_item_182439'. Preview of field's value: '2024-01-09T22:21:04.567414642Z'", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"failed to parse date field [2024-01-09T22:21:04.567414642Z] with format [yyyy-MM-dd HH:mm:ss||yyyy-MM-dd'T'HH:mm:ss.SSS'Z']", "caused_by"=>{"type"=>"date_time_parse_exception", "reason"=>"Failed to parse with all enclosed parsers"}

I’ve tried various changes to the format in the conf file (like ISO8601, adding the convert option to change the field to date_time, ChatGPT recommended some ruby option with code the change the date format … none changed the error condition.

Current logstash conf file (with *** for hiding sensitive info.)...

input {
  file {
    id => "bulk_***_carts_items_input"
    path => "/etc/logstash/data/****/*.csv"
    max_open_files => 1
    mode => "read"
    start_position => "beginning"
    exit_after_read => true
    tags => ["bulk-load"]
    type => "csv"
    file_completed_action => "log"
    file_completed_log_path => "/etc/logstash/data/processed-log/processed.log"
  }
}

filter {
  csv {
    skip_empty_rows => "true"
    separator => ","
    columns => ['elastic_index_id', 'elastic_index_created_date', 'cart_received_timestamp', 'cart_item_id', 'cart_item_quantity', 'cart_item_description', 'cart_item_unit_price', 'cart_item_curren$
    convert => {
      "cart_item_received_timestamp" => "date_time"
      "cart_item_updated_timestamp" => "date_time"
      "cart_received_timestamp" => "date_time"
      "elastic_index_created_date" => "date_time"
      "session_timestamp" => "date_time"
      }
  }
  date {
    match => ["cart_received_timestamp", "ISO8601"]
    target => "@timestamp"
  }
  mutate {
    remove_field => ["[event]", "[type]", "[host]", "[message]", "[log]"]
  }
}

output {
  elasticsearch {
    cloud_id => "${ES_CLOUD_ID}"
    cloud_auth => "${ES_CLOUD_USERNAME}:${ES_CLOUD_PASSWORD}"
    index => "******-bulk-1"
    action => "update"
    doc_as_upsert => true
    document_id => "%{elastic_index_id}"
  }
  stdout { codec => rubydebug }

Solution

  • The format you have yyyy-MM-dd'T'HH:mm:ss.SSS'Z' is waiting for date with 3 millisecond char.

    fraction of a second Maximum precision is milliseconds (SSS). Beyond that, zeroes are appended.

    S: tenths of a second. Example: 0 for a subsecond value 012 SS: hundredths of a second. Example: 01 for a subsecond value 01 SSS: thousandths of a second. Example: 012 for a subsecond value 012 https://www.elastic.co/guide/en/logstash/current/plugins-filters-date.html#plugins-filters-date-locale

    You can use strict_date_optional_time_nanos as timestamp format.

    strict_date_optional_time_nanos A generic ISO datetime parser, where the date must include the year at a minimum, and the time (separated by T), is optional. The fraction of a second part has a nanosecond resolution. Examples: yyyy-MM-dd'T'HH:mm:ss.SSSSSSZ or yyyy-MM-dd.

    You can check the logstash date filter plugin details here.

    PUT test_date4
    {
      "mappings": {
        "properties": {
          "@timestamp": {
            "type": "date",
            "format": "strict_date_optional_time_nanos"
          }
        }
      }
    }
    

    PUT test_date4/_doc/1
    {
      "@timestamp": "2024-01-09T22:21:04.567414642"
    }
    

    GET test_date4/_search