Search code examples
nginxflume

Using NGINX to forward tracking data to Flume


I am working on providing analytics for our web property based on instrumentation data we collect via a simple image beacon. Our data pipeline starts with Flume, and I need the fastest possible way to parse query string parameters, form a simple text message and shove it into Flume.

For performance reasons, I am leaning towards nginx. Since serving static image from memory is already supported, my task is reduced to handling the querystring and forwarding a message to Flume. Hence, the question:

What is the simplest reliable way to integrate nginx with Flume? I am thinking about using syslog (Flume supports syslog listeners), but I struggle with how to configure nginx to forward custom log messages to a syslog (or just TCP) listener running on a remote server and on a custom port. Is it possible with existing 3rd party modules for nginx or would I have to write my own?

Separately, anything existing you can recommend for writing a fast $args parser would be much appreciated.

If you think I am on a completely wrong path and can recommend something better performance-wise, feel free to let me know.

Thanks in advance!


Solution

  • You should parse nginx log file like tail -f do and then pass results to Flume. It will be the most simple and reliable way. The problem with syslog is that it blocks nginx and may completely stuck under high-load or if something goes wrong (this is why nginx doesn't support it).