Search code examples
javapython-3.xlogstashfilebeat

Filebeat vs Directly pushing logs to logstash from application


I am planning to architect a centralized logging system for one of our project which has multiple components written in Java, Python & Scala. I want to collect logs from different parts ( REST Server, Spark Jobs, Airflow server ) to logstash and index into Elastic search. I could see there are direct libraries in both Python & Java logging modules to push logs directly to logstash from application. And I could see filebeat which can be configured on servers to push logs to logstash from files. What is the advantage of having filebeat rather than sending logs directly to logstash? What is the best practice?


Solution

  • Here are a few pros and cons of both approaches:

    Application Logs => Logstash

    Pros:

    • Lesser components to manage and straight forward pipeline

    Cons:

    • Congestion at Logstash or its outage may adversely affect your application
    • Changes to log destination may require you to redeploy or restart your application

    Application Logs => Filebeat => Logstash

    Pros:

    • Filebeat is a lightweight utility which allows you to decouple your log processing from application logic
    • Change of log destination is a breeze, and it natively supports load-balancing among multiple instances of logstash destinations
    • Logs can be enriched with additional fields, or you can perform conditional processing of logs just by changing filebeat configurations, e.g. send logs for customer A to Logstash A
    • Logs are buffered locally and will be reliably transferred to Logstash even if logstash process gets restarted or becomes unavailable for a certain amount of time (Provided your log files remain on the disk to be consumed by filebeat and has appropriate configs)

    Cons:

    • Another component to manage in your application architecture
    • Requires additional system resources (usually does very lightweight processing)