Search code examples
database-designarchitecturescalabilitylarge-scale

Scalable System Architecture/Design for Reading/Parsing Files


Background: I am designing a software application that reads millions or much more files and either converts or just parses those files. Part of requirement is to build a scalable and distributed system so that reading and parsing can be scaled accordingly.

Basically, a minimally detailed list of filenames is one DB and Clients need to access the list to know which files need to be parsed/converted next. The files again are on another server/location. While most of the pieces are designed, one critical piece that needs a revisit is a design of feeding the file-names to different clients.

I have two options now:

  1. Design a single service that sits next to DB and channelizes all requests to file names and feeds the clients. So in this case, Clients talk to the service(predefined protocol/format) and get the list.

  2. Design Clients to talk directly to DB and implement synchronization/channelization within clients.

My only concern with first option is that, is that a scalable architecture/design? Has anyone dealt with such an circumstance in scalable architecture where one resource becomes a critical in scaling (In my case it could be One service feeding/servicing all clients)


Solution

  • I would like to suggest that you look at message queues such as Rabbit MQ(http://www.rabbitmq.com), Microsoft Message Queue (http://bit.ly/GMo4iI) and IBM Message Queue (http://bit.ly/GMo6qY), which already have a scaling architectures in place.

    You can setup clients to request for messages from the queue and configure each message body to contain the details of the files to be processed. The client can then delete the message from the queue once the file has been processed.

    You need to setup mechanisms to make sure the same files are not read at the same time etc, but this can be done at the queue level and you configure each client to look at specific queues or message priorities.