Search code examples
javasocketsrmidistributedgrid-computing

Distributed computing platform in Java


I was asked to make a platform to download tweets from the Twitter streaming API. The basic idea is to have a controller to generate tasks with information about what do download (keywords) and how to serialize data. This tasks are sent to remote servers (same o different network) to execute the task and periodically save tweets in a DB. What I need is this:

  • Controller: Must have a connection with the Fetchers to send tasks to them. Must validate all fetchers connections.
  • Fetcher: Should retrieve tweets from Twitter streaming API based on task keyword. Only one task per fetcher. No need to register it manually. Just execute it and run received tasks.
  • DB: Must store tweets' JSON periodically. As there would be a lot of fetcher I need something to avoid a bottleneck.

Having said this, what I'm looking for is a good idea on how to implement this. Currently I'm using SSLSockets for the validation process. After this, I close the socket and I use RMI to publish fetchers and store Registries in the server (Controller). It's working... more or less... but I'm not sure if is a good idea to do it this way.

Do you have any idea about how to implement distributed computing platform? What should I use?

Thank you.


Solution

  • As far as I know RMI only enables you to execute code that already exists on the fetchers side. But it sounds like you want to send code to the Fetcher to be executet.

    In that case I would consider writing an own ClassLoader and send class bytecode to the fetcher which is then loaded and executed.

    As a protocoll I would recommend HTTP, there are relativly stable implementations that also support TLS and SSL around and it removes a lot of Socket related pain from you.