Search code examples
javamultithreadingconcurrent.futures

Java design that uses FixedThreadPool(s) and queues


I am working on a design of a program that will need to fetch results from a datastore and post those results to another system. The data that I am fetching is referenced by a UUID, and has other documents linked to it by UUIDs. I will be posting a lot of documents (>100K documents), so I would like to do this concurrently. I am thinking about the following design:

Get the list of documents from the datastore. Each document would have:

docId (UUID)
docData (json doc)
type1 (UUID)
type1Data (json)
type2 (UUUID)
type2Data (json)
list<UUID> type3Ids
list of type3 data (json)

The only data that I get from my first call are the docIds. I was thinking of pushing these documents into a queue and having a set of workers (fetchers) make the relevant calls back to the datastore to retrieve the data.

retrieve the docData from datastore, fill in the type1, type2 and type3 UUIDS
do a batch get to retrieve all the type1, typ2 and type3 docs
Push the results into another queue for posting to other system

The second set of workers (posters) would read from the scond queue each document and post the results to the second system.

One question that I have, should I create 1 FixedThreadPool(size X) or two FixedThreadPool(size X/2)? Is there a danger of starvation if there are a lot of jobs in the first queue such that the second queue would not get started until the first queue was empty?

The fetchers will be making network coalls to talk to the database, they seem like they would be more IO bound than CPU bound. The posters will also make network calls, but they are in the cloud in the same VPC as where my code would run, so they would be fairly close together.


Solution

  • Blocking Queue

    This is a pretty normal pattern.

    If you have two distinct jobs to do, use two distinct thread pools and make their size configurable so you can size them as needed / test different values on the deployment server.

    It is common to use a blocking queue (BlockingQueue built into Java 5 and later) with a bounded size (say, 1000 elements for an arbitrary example).

    The blocking queue is thread-safe, so everything in the first thread pool writes to it as fast as they can, everything in the second thread pool reads as fast as it can. If the queue is full, the write just blocks, and if the queue is empty, the read just blocks - nice and easy.

    You can tune the thread numbers and repeatedly run to narrow down the best configured size for each pool.