Search code examples
logstashapache-stormspark-streaming

What are the main differences between logstash and apache storm/spark streaming?


I am searching a distributed real-time computing system that will collect data from a kafka server in order to process the data and then to store it in ElasticSearch. I already selected some of them:

  • Apache Storm
  • Apache Spark Streaming
  • and Logstash (which is more descripted as an ETL (Extract, Transform, load))

I already found several tutorials comparing Storm and Spark Streaming. However, I did not find any tutorial comparing logstash to storm and spark streaming. This is very confusing for me because I am already familiar with logstash but I want to be sure that I select the right tool for my needs.

Thank you in advance


Solution

  • Logstash is a data collection engine with real-time capabilities. It supports analysis, archiving, monitoring, alerting..based on some pre-defined metrics. --> Logstash is a kind of specific product, solution Apache Spark and Storm are very general distributed real-time computation systems. --> Apache Spark/Storm are just frameworks/libraries for general purposes.