Search code examples
hadoopmapreduceudpapache-storm

UDP data stream handling with MapReduce


I have a problem using real time UDP stream processing with the map reduce system. Actually I am doing a university project and I want to use mapreduce to process this data. UDP stream is about ship data from several AIS devices.

As far as I am aware, Apache Storm will be the solution for that. But I dont know that I can incorporate mapreduce to the Storm . I want to incorporate mapreduce concepts and ultimately I want to learn it.

Also I want to have some advice about the system architecture, the normal procedure is this,

UDP stream received by the system decode the stream real time analytic should be shown stored for future data retrial purposes.

so can anyone suggest what is the best way to do this? can Apache Storm do this?


Solution

  • I'll answer the easy question first: Yes, Apache Storm can do what you want it to do.

    That said, any of other 'big data' streaming tools can do this data processing as well. These tools include Storm, but also Spark and Samza.

    If I were building this myself, I'd push the streaming data into a messaging queue, probably Kafka, then use Storm to pull individual messages out and process them. You can then store the result however you want. That could be onto disk, back into Kafka, or whatever makes sense in your case.

    Finally, it doesn't seem that mapreduce is a good fit to your problem. Mapreduce is for batch processing, which isn't what you are describing as your problem.