Search code examples
amazon-web-serviceskubernetesamazon-s3fluentd

FluentD uploads new file, instead of rotated file to s3 for hourly log rotation of log files


We have a python application that generates hourly rotating log, and we have set up the time for each rotation as start of each hour, i.e. the log rotation would happen at 10:00, 11:00, 12:00 .... The application is deployed in Kuberenetes pod and FluentD is used as side-car container so as to upload these log files to S3 bucket with the path being

s3bubcket/<id>/logs/%Y-%m-%d/%H/metrics

so we are trying to create different folders for each hour of the day and upload the logs of that hour into the bucket. In FluentD we have set the upload interval as 60s and 30s respectively but each time (eg at 10:00) FluentD uploads the new hour generated log file (which is now currently blank or has some logs for 10:00) to Amazon S3 bucket into the previous hour folder (which is 9:00), thus overwriting the previous hour logs (in our case logs for 9:00 till 9:59:59).

We have tried using timekey as 60s and 30s, delaying/increasing log rotation time and other settings (rotate_wait, refresh_interval, ) to get upload till the start of the next hour in proper folder but delaying leads to overwriting of logs and increasing time leads to loss of logs.

Logs for fluentd:

2022-04-06 10:59:00 +0000 [warn]: #1 def7zme94qc7q9folg5zly641/endpoints/p5wspkd85sr61/mep/2022-04-06/10/meplogs/logs_10.gz already exists, but will overwrite
2022-04-06 10:59:31 +0000 [warn]: #1 def7zme94qc7q9folg5zly641/endpoints/p5wspkd85sr61/mep/2022-04-06/10/meplogs/logs_10.gz already exists, but will overwrite
2022-04-06 10:59:57 +0000 [info]: #1 detected rotation of /var/log/cluster-env/gateway.log; waiting 120.0 seconds
2022-04-06 10:59:57 +0000 [info]: #0 detected rotation of /var/log/cluster-env/gateway.log; waiting 120.0 seconds
2022-04-06 10:59:57 +0000 [info]: #1 following tail of /var/log/cluster-env/gateway.log
2022-04-06 10:59:57 +0000 [info]: #0 following tail of /var/log/cluster-env/gateway.log
2022-04-06 10:59:57 +0000 [info]: #1 following tail of /var/log/cluster-env/gateway.log.2022-04-06_09
2022-04-06 11:00:01 +0000 [warn]: #1 def7zme94qc7q9folg5zly641/endpoints/p5wspkd85sr61/mep/2022-04-06/10/meplogs/logs_10.gz already exists, but will overwrite

So even when time is 11:00:01 logs are uploaded to the 10th hour folder.

FluentD config for logs

<worker 1>
    <source>
      tag "gateway-s3-logs"
      @label @gateway-s3-logs
      @type tail
      path "/var/log/cluster-env/gateway.log"
      pos_file "/var/log/cluster-env/gateway.log-s3-container-log-in-tail.pos"
      read_from_head true
      follow_inodes true
      refresh_interval 5
      rotate_wait 120
      <parse>
        @type "none"
        unmatched_lines 
      </parse>
    </source>
    <label @gateway-s3-logs>
      <match gateway-s3-logs>
        @type s3
        s3_bucket "sranjha-log-test"
        s3_region "us-west-2"
        path "def7zme94qc7q9folg5zly641/endpoints/p5wspkd85sr61/mep/%Y-%m-%d/%H/meplogs"
        s3_object_key_format "%{path}/logs_%H.gz"
        check_apikey_on_start false
        overwrite true
        utc 
        <buffer time>
          @type "file"
          path "/tmp/fluentd/mep-logs/logs/out-s3-buffer*"
          chunk_limit_size 64MB
          flush_at_shutdown true
          timekey 30
          timekey_wait 0
          retry_timeout 30s
          retry_type exponential_backoff
          retry_exponential_backoff_base 2
          retry_wait 1s
          retry_randomize true
          disable_chunk_backup true
          retry_max_times 5
        </buffer>
        <local_file_upload>
          file_path "/var/log/emr-on-cluster-env/gateway.log"
        </local_file_upload>
        <secondary>
          @type "secondary_file"
          directory "/var/log/fluentd/error/"
          basename "s3-mep-error.log"
        </secondary>
        <format>
          utc 
          localtime false
        </format>
        <inject>
          localtime false
        </inject>
      </match>
    </label>
  </worker>
  <worker 1>

So, is there a way where we can for fluentd to write files till hh:59:59 to the previous hour (hh) folder and from (hh+1:00:00) to new hour (hh + 1) folder.


Solution

  • Here is our config in which we send logs to s3 every minute.

      type s3
    <template>
     s3_bucket "mybucket"
     s3_region "my_region"
     path my_path/
     s3_object_key_format %{path}%{time_slice}_%{index}.%{file_extension}
     time_slice_format ${tag}/YEAR=%Y/MONTH=%m/DAY=%d/HOSTNAME=${hostname}/HOUR=%H/%M
    <format>
     @type json
    </format>
     store_as gzip
    <buffer time>
     timekey 30
     @type file
     path /var/log/td-agent/buffer/s3/${tag}
     timekey_wait 1m
     chunk_limit_size 50m
     flush_at_shutdown true
    </buffer>
    </template>
    

    The last logs we get is of form 59_1.gz and the last message inside this gzip is at "2022-04-07T18:59:44.777+0300".