Search code examples
javataskdistributed-system

How to allocate tasks in distributed system?


I have large number of tasks and several worker servers. I want to allocate these tasks to these workers evenly, even if a worker server goes down.

My idea is that I split the tasks into several shards and send each shard to MQ. Each server reads a MessageQueue. I want the task to be processed as soon as possible. But how to deal with the situation that if a server goes down, the tasks in its MessageQueue cannot be consumed in a timely manner?

By -the-way, are there any JAVA frameworks that can help with this situation?


Solution

  • What you are describing is a cluster with shared message queues. As Thomas Timbul said, all the servers should read from the same message queue. If you are using IBM MQ you should ideally install the queue manager on a separate system and have the servers connect so that if one server goes down it does not affect the others.

    Each server will pull a message off the queue and process it on demand. Using a J2EE server you can specify the number of threads reading the queue (the number of MDBs) on each server. For example, in WebSphere it is the maxSessions setting on the port listener.

    If one server fails while processing a message the transaction manager should roll-back the transaction and the message will go back on the queue to be read by another server.

    If servers process messages at different rates, it doesn't matter as each server just pulls messages off the queue when they need them.

    Be careful with messages that can't be processed as they can cause the queue to be blocked. You need to have a retry count and a back-out queue to which bad messages are sent if they exceed the rery count. These are referred to as "poison" messages and are the subject of other questions on Stackoverflow and elsewhere.