Consider a worker process which searches the web for particular data. Another process is needed to index the results of the first process for later use. The indexing part involves writing raw data (search results) to a huge distributed HBase repository in a particular way. I can't judge about the speed of these two processes compared to each other. We may have a situation in which one of these systems is temporarily down and needs to take the task where it is put upon its wake-up. I'm using JavaEE. Currently, this is the way I have in mind to implement this.
I need expert comments on my design to verify its appropriateness. For instance, what if the second process continuously polls the table to see if there is new records? Am I using the correct technology or it is an overkill? Should I simplify my design or am I missing something? If my solution is appropriate, is there something I should keep in mind during the implementation? Thanks in advance.
I would stick to a simpler design if possible, ditch the MySQL staging table and stick to JMS.
So, something like this would do it:
Messaging is there to help you with these tasks, polling a database table is pretty much the same, but trickier, so why reinvent the wheel when you have a persistent and transactional MOM available that is designed for this task.