I work on a project with about a half million active users. We track a lot of user interaction events (clicks etc.). Using these events we need to compute statistical data of user behavior. Currently statistics get computed in CRON background tasks.
We would like the statistics to be as much as possible "online" - not e.g. ranging from 0 to 30 minutes old. Also we want to compute many more statistics. So the solution has to be scalable.
My idea is to have queues for events that get pushed to by frontend app and pulled from by daemons processing events computing statistics incrementally. Daemons can be added as needed, there can be different kinds of daemons for different statistics. Would you recommend this approach?
Is there a framework for this kind of data processing? Links to any resources will be very helpful.
Twitter Storm seems to be what I was looking for - https://github.com/nathanmarz/storm.