Search code examples
elasticsearchlogstash

Difference between ignore_older and sincedb_path in logstash


I'm gathering data from csv file. since data should be imported only one times, I needed to set some config for it in logstash. Below both config works same way that brings only new added rows. Any difference between them?

1.
start_position => "beginning"
ignore_older => 0 

2.
sincedb_path => "/dev/null"
start_position => "end"

Solution

  • From the documentation:

    ignore older

    When the file input discovers a file that was last modified before the specified timespan in seconds, the file is ignored. After it’s discovery, if an ignored file is modified it is no longer ignored and any new data is read. The default is 24 hours.

    And

    start position

    Choose where Logstash starts initially reading files: at the beginning or at the end. The default behavior treats files like live streams and thus starts at the end. If you have old data you want to import, set this to beginning.

    This option only modifies "first contact" situations where a file is new and not seen before, i.e. files that don’t have a current position recorded in a sincedb file read by Logstash. If a file has already been seen before, this option has no effect and the position recorded in the sincedb file will be used.

    So in your case number 1:

    You will start reading files from the beginning. In case you have old log files, you should do that, otherwise they won't be parsed (as logstash will wait for appending to the files). You are also including ALL files. Setting ignore_older to 0 will simply include everything. If you want to say exlcude everything that is older than X, you need to specify this option (e.g. when you want to reparse all your files, but you don't care about logs that are older than 2 weeks).

    Your use case number 2:

    You will start reading all files from the end. Since you are nulling your sincedb path, this means that you will do that every time you restart, hence the logs appended while your logstash is down will be ignored, since logstash will not remember where in the file it left of.

    Why you are seeing the same results:

    These options only take effect on startup and new files. After you start up logstash, it makes no difference what they are doing. If you never shut it down (maintenance or similar) you will also not see any difference.

    The first usecase however is "better". It will reparse all new files since you specify 0, it also remembers where it has left of which will be useful when you shut down your logstash for a bit. The second usecase will loose data in case of restarts. It will also ignore all files that are older than 24 hours from being last modified.

    Read more about logstash file input here: https://www.elastic.co/guide/en/logstash/current/plugins-inputs-file.html