So I have a fairly modest Logstash setup for Apache logs that I am using on RedHat 7 (production) as well as macOS High Sierra (10.13.6) for development and something odd has happened since upgrading from Logstash version 6.3.2 to 6.4.1. I am using Homebrew on macOS to install and update Logstash and these issues persist even if I “nuke” my installed Hombrew items and reinstall.
Simply put, static data input files are not being read and ingested on startup in 6.4.1 as they once did on 6.3.2 and earlier. For 6.4.1 I need to manually cat
log lines to the target path for Logstash to “wake up” and pick up these new lines even if I designate the new read
mode.
At the end of the day, this setup doesn’t need a sincedb
setup and can be restarted and read from the head of file to end and we are all happy… At least until Logstash 6.4.1… Now nobody is happy. What can be done to force Logstash to always read data from the beginning of files no matter what?
The Logstash setup I am using just does some filtering of Apache logs for input. The input config I am using reads as follows; note that the file path is slightly tweaked for privacy but is effectively exactly what I am using right now and have been using for the past year or so without issue:
input {
file {
path => "/opt/logstash/coolapp/access_log*"
exclude => "*.gz"
start_position => "beginning"
sincedb_path => "/dev/null"
ignore_older => 0
close_older => 3600
stat_interval => 1
discover_interval => 15
}
}
The way I am using this for local development is simply getting a copy of remote Apache server logs and placing them in that /opt/logstash/coolapp/
directory.
Then when I startup Logstash via the command line like this with the -f
potion set so my coolapp-apache.conf
is read:
logstash -f coolapp-apache.conf
Logstash starts up locally, emits all of it’s pile of start up status messages until this final message:
[2018-09-24T12:40:09,458][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
Which to me indicates it’s fully up and running and checking my data collection output shows—if it is working—a flow of data pouring in… But when using Logstash 6.4.1 I see no data flowing in.
tail
mode.Checking the newly updated documentation for the file input plugin (v4.1.5) shows there is a new mode
option that has a read
mode and a tail
mode. Knowing that the default mode is tail
I tested the setup by doing the following after starting up my local Logstash debugging setup. First I copied the access_log
as follows:
cp /opt/logstash/coolapp/access_log /opt/logstash/coolapp/access_log_BAK
Then I zeroed out the main access_log
file using :>
like this:
:> /opt/logstash/coolapp/access_log
And finally I just ran cat
and appended that copied file’s data to the original file like this:
cat /opt/logstash/coolapp/access_log_BAK > /opt/logstash/coolapp/access_log
The second I did that, lo and behold the data started to flow as expected! I guess the new file input plugin is focused on tailing a file more than
read`ing? Anyway, that works but is clearly annoying. I don’t develop like this. I need Logstash to simply read the files and parse them.
read
mode.So I tried using the following setup to just read the files based on what I saw in the official Logstash file input mode
documentation:
input {
file {
path => "/opt/logstash/coolapp/access_log"
mode => "read"
file_completed_action => "log"
file_completed_log_path => "/Users/Giacomo1968/Desktop/access_log_foo"
}
}
Of course things like access_log_foo
is just for proof-of-concept file name for testing, but when all is said and done this read
mode utterly does not work on macOS. I have even tried changing the path
to something like my desktop and it doesn’t work. And the whole “zero out and then append a file” trick I used as explained in the “tail mode” explanation doesn’t cut it here since the file is not being tail
ed I guess?
So knowing all of that:
Okay, I figured this out. I am now on Logstash 6.5 and my original config was as follows:
input {
file {
path => "/opt/logstash/coolapp/access_log*"
exclude => "*.gz"
start_position => "beginning"
sincedb_path => "/dev/null"
ignore_older => 0
close_older => 3600
stat_interval => 1
discover_interval => 15
}
}
When I redid it getting rid of ignore_older
and adjusting close_older
and stat_interval
to use string_duration
things started working again as expected.
input {
file {
path => "/opt/logstash/coolapp/access_log*"
exclude => "*.gz"
start_position => "beginning"
sincedb_path => "/dev/null"
close_older => "1 hour"
stat_interval => "1 second"
discover_interval => 15
}
}
My assumption is that Logstash 6.3.2 interpreted ignore_older
being set to 0
as false
thus disabling ignore_older
but in version 6.4 and higher that value is now being interpreted as an actual time value in seconds? Haven’t dug deeply into the source code, but everything I have experienced points to that being the issue.
Regardless, this config now works and I am running Logstash 6.5 on macOS Mojave (10.14.1) without any issues.