Search code examples
fluentd

How to get a program's std-out to fluentd (without docker)


Scenario:

You write a program in R or Python, which needs to run on Linux or Windows, you want to log (JSON structured and unstructured) std-out and (mostly unstructured) std-error from this program to a Fluentd instance. Adding a new program or starting another instance should not require to update the Fluentd configuration and the applications will not (yet) be running in a docker environment.

Question:

How to send "logs" from a bunch of programs to an fluentd instance, without the need to perform curl calls for every log entry that your application was originally writing to std-out?

When a UDP or TCP connection' is necessary for the application to run, it seems to become less easy to debug, and any dependency of your program that returns std-out will be required to be parsed, just to get it's logging passed through.

Thoughts:

Alternatively, a question could be, how to accept a 'connection' object which can either point to a file or to a TCP connection? So that switching between the std-out or a TCP destination is a matter of changing a single value?

I like the 'tail' input plugin, which could be what I am looking for, but then:

  1. the original log file never appears to stop growing (will the trail position value reset when it is simply removed? I couldn't find this behaviour), and
  2. it seems that it requires to reconfigure fluentd for every new program that you start on that server (if it logs in another file), I would highly prefer to keep that configuration on the program side...

I build an EFK stack with a docker logdriver set to fluentd, which does not seem to have an optimal solid solution either, but without docker, I already get kind of stuck with setting up a basic configuration (not referring to fluent.conf here).


Solution

  • TL;DR

    • std-out -> fluentd: Redirect the program output, when launching your program, to a file. On linux, use logrotate, you will love it.
    • Windows: use fluent-bit.
    • App side config: use single (or predictable) log locations, and the fluentd/fluent-bit 'in_tail' plugin.

    logging general:

    It's recommended to always write application output to a file, if the std-out must be written to a file, pipe it's output at program startup. For more flexibility for the fluentd configuration, pipe them to separate files (just like 'Apache' does):

    My_program.exe Do some crazy stuf > my_out_file.txt 2> my_error_file.txt
    

    This opens the option for fluentd to read from this/these file(s).

    Windows:

    For Windows systems, use fluent-bit, it likely solves the issue for aggregating the Windows OS program logs. Support for Windows has just been implemented recently.

    fluent-bit supports:

    1. the 'tail' plugin, which records the 'inode' value (unique, renaming insensitive, file pointer) and the 'index' (called 'pos' for the full-blown 'fluent' application) value in a sqllite3 database and deals with un-processable data, which is allocated to a certain key ('log' by default)
    2. Works on Windows machines, but note that it cannot buffer to disk, so be sure a lost connection, or another issue with the output, is reestablished or fixed in time so that you will not be running into OOM issues.

    Appl. side config:

    The tail plugin can monitor a folder, this makes it practically possible to keep the configuration on the side of your program. Just make sure you write your logs of your different applications to a predictable directory.

    Fluent-bit setup/config:

    For Linux, just use fluentd (unless > 100000 messages per second are required, which is where fluent-bit becomes your only choice).

    For Windows, install Fluent-bit, and make it run as a deamon (almost funny sollution).

    There are 2 execution methods:

    1. Providing configuration directly via the commandline
    2. Using a config file (example included in zip), and referring to it with the -c flag.

    Directly from commandline

    Some example executions (without making use of the option to work with a configuration file) can be found here:

    PS .\bin\fluent-bit.exe -i winlog -p "channels=Setup,Windows PowerShell" -p "db=./test.db" -o stdout -m '*'
    

    -i declares the input method. Currently, only a few plugins have been implemented, see the man page below.

    PS fluent-bit.exe --help
    
    Available Options
      -b  --storage_path=PATH       specify a storage buffering path
      -c  --config=FILE     specify an optional configuration file
      -f, --flush=SECONDS   flush timeout in seconds (default: 5)
      -F  --filter=FILTER    set a filter
      -i, --input=INPUT     set an input
      -m, --match=MATCH     set plugin match, same as '-p match=abc'
      -o, --output=OUTPUT   set an output
      -p, --prop="A=B"      set plugin configuration property
      -R, --parser=FILE     specify a parser configuration file
      -e, --plugin=FILE     load an external plugin (shared lib)
      -l, --log_file=FILE   write log info to a file
      -t, --tag=TAG         set plugin tag, same as '-p tag=abc'
      -T, --sp-task=SQL     define a stream processor task
      -v, --verbose         increase logging verbosity (default: info)
      -s, --coro_stack_size Set coroutines stack size in bytes (default: 98302)
      -q, --quiet           quiet mode
      -S, --sosreport       support report for Enterprise customers
      -V, --version         show version number
      -h, --help            print this help
    
    Inputs
      tail                  Tail files
      dummy                 Generate dummy data
      statsd                StatsD input plugin
      winlog                Windows Event Log
      tcp                   TCP
      forward               Fluentd in-forward
      random                Random
    
    Outputs
      counter               Records counter
      datadog               Send events to DataDog HTTP Event Collector
      es                    Elasticsearch
      file                  Generate log file
      forward               Forward (Fluentd protocol)
      http                  HTTP Output
      influxdb              InfluxDB Time Series
      null                  Throws away events
      slack                 Send events to a Slack channel
      splunk                Send events to Splunk HTTP Event Collector
      stackdriver           Send events to Google Stackdriver Logging
      stdout                Prints events to STDOUT
      tcp                   TCP Output
      flowcounter           FlowCounter
    
    Filters
      aws                   Add AWS Metadata
      expect                Validate expected keys and values
      record_modifier       modify record
      rewrite_tag           Rewrite records tags
      throttle              Throttle messages using sliding window algorithm
      grep                  grep events by specified field values
      kubernetes            Filter to append Kubernetes metadata
      parser                Parse events
      nest                  nest events by specified field values
      modify                modify records by applying rules
      lua                   Lua Scripting Filter
      stdout                Filter events to STDOUT