Search code examples
rubyfluentd

[Fluentd]How to Unzip files in fluentd


I am trying to process log files with .gz extension in fluentd using cat_sweep plugin, and failed in my attempt. As shown in the below config, I am trying to process all files under /opt/logfiles/* location. However when the file format is .gz, cat_sweep is unable to process the file, and starts deleting the file, but if I unzip the file manually inside the /opt/logfiles/ location, cat_sweep is able to process, the file.

<source>
   @type cat_sweep
   file_path_with_glob /opt/logfiles/*
   format none
   tag raw.log
   waiting_seconds 0
   remove_after_processing true
   processing_file_suffix .processing
   error_file_suffix .error
   run_interval 5
</source>

So now I need some plugin that can unzip a given file. I tried searching for plugins that can unzip a zipped file. I came close when I found about the plugin, which acts like a terminal, where I can use something like gzip -d file_path

Link to the plugin:

http://docs.fluentd.org/v0.12/articles/in_exec

But the problem I see here, is that I cannot send the path of the file to be unzipped at run-time.

Can someone help me with some pointers?


Solution

  • Looking at your requirement, you can still achieve it by using in_exec module, What you have to do is, to simply create a shell script which accepts path to look for .gz files and the wildcard pattern to match file names. And inside the shell script you can unzip files inside the folder_path that was passed with the given wildcard pattern. Basically your shell execution should look like:

    sh unzip.sh <folder_path_to_monitor> <wildcard_to_files>

    And use the above command in in_exec tag in your config. And your config will look like:

    <source>
      @type exec
      format json
      tag unzip.sh
      command sh unzip.sh <folder_path_to_monitor> <wildcard_to_files>
      run_interval 10s
    </source>