Search code examples
javapipeapache-kafkaproducer-consumer

Is pipe or BufferedReader in java likely to loose data?


Aim

To read all the logs from apache server and store on s3

Background

We have following statement in the httpd.conf

ErrorLog "| /usr/bin/tee -a /var/log/httpd/error_log | /usr/bin/java -cp /usr/local/bin/CustomProducer/producer-1.0-SNAPSHOT-jar-with-dependencies.jar stdin.producer.StdInProducer /usr/local/bin/CustomProducer/Config.json >> /var/log/producer_init.log 2>&1"

This puts the log in error_log file as well as on std out to be consumed by a java producer for Apache kafka

This producer eventually sends the data to kafka cluster and then amazon S3.

The error_log file gets rotated and then also stored on S3 using logrotate

Producer Code

this.stdinReader = new BufferedReader(new InputStreamReader(System.in));
try {
         while ((msg = this.stdinReader.readLine()) != null) {
               //Some processing which may introduce some delay
               //Send message to cluster
                this.producer.send(message); 
         }    
    }

Problem

When hourly logs are compared from kafka bucket and logrotate bucket some logs are intermittently missing without specific pattern or time.

Is it likely due to pipe limit or BufferedReader limit ? What is the way to find this out ?


Solution

  • No. Not even slightly. The Reader is exactly as reliable as the underlying pipe or socket. If it's TCP it can't lose data without resetting the connection.