Search code examples
pythontornadoapache-kafka

Possible producers to get tracking data into kafka


i'm new to kafka and i read things about data processing and further analysis (for example spark) with kafka but nothing about the actual process of putting data into kafka. I know that i have to think of it as producers. But how can i for instance send away tracking data from a web app and take this into kafka. Should i use nginx log files as a producer or a server which is capable of writing this directly to kafka (for example tornado and python lib for kafka)...? How would you create a very simple analytics tool which takes data from get requests and put this for further processing into kafka?

Any remarks or comments, also just little hints, would help me to get my head around this.


Solution

  • If you have the choice of using a server capable of writing directly to Kafka (or integrating a producer with your application code) and wouldn't have any other drawbacks, I would definitely do that to avoid the whole log file parsing step. In this case you would connect any analytics solution downstream as a Kafka consumer and could stream the data into it.

    If you decide on touching disk on the web app servers first instead, there are many solutions for parsing and forwarding to Kafka; Flume/Flafka, Logstash, KafkaCat etc. Have a look at the Kafka ecosystem page. Some of these options give you the possibility of transforming the data before it arrives to the Kafka brokers which might be valuable in certain cases.