Aim
To read all the logs from apache server and store on s3
Background
We have following statement in the httpd.conf
ErrorLog "| /usr/bin/tee -a /var/log/httpd/error_log | /usr/bin/java -cp /usr/local/bin/CustomProducer/producer-1.0-SNAPSHOT-jar-with-dependencies.jar stdin.producer.StdInProducer /usr/local/bin/CustomProducer/Config.json >> /var/log/producer_init.log 2>&1"
This puts the log in error_log
file as well as on std out to be consumed by a java producer for Apache kafka
This producer eventually sends the data to kafka cluster and then amazon S3.
The error_log
file gets rotated and then also stored on S3 using logrotate
Producer Code
this.stdinReader = new BufferedReader(new InputStreamReader(System.in));
try {
while ((msg = this.stdinReader.readLine()) != null) {
//Some processing which may introduce some delay
//Send message to cluster
this.producer.send(message);
}
}
Problem
When hourly logs are compared from kafka bucket and logrotate bucket some logs are intermittently missing without specific pattern or time.
Is it likely due to pipe
limit or BufferedReader
limit ? What is the way to find this out ?
No. Not even slightly. The Reader
is exactly as reliable as the underlying pipe or socket. If it's TCP it can't lose data without resetting the connection.