Search code examples
hazelcastappscale

Hazelcast or AppScale to manage parallel computational tasks over a shared dataset


Starting out on a new project and looking for advice on a suitable platform. Current thinking is between Hazelcast or AppScale, given our team’s combined (but limited) experience covers an older version of Hazelcast and GAE. Both can also apparently be setup on EC2, which may be the easiest way to meet the CPU demand we expect.

Problem Profile

1). Our data consists of many small records stored by date (but not always time). Some are small numerical records (business stats, looks like daily weather info or stock market prices) and some are bulky text (log file entries). Data volumes not huge, in the region of hundreds/day between 1k and 50k each.

2). Very very large number of instances of computationally expensive numerical models (think monte-carlo sims) operate constantly over fixed-size windows of the same data.

3). A number of monitoring agents make data available.

4). Larger (longer periods of time) sets of the same data to be processed offline once daily.

With Hazelcast we would add incoming data to maps and use the Executor service to run models over the shared data. Likely use of Tomcat to provide minimal front end access to the grid as required.

With AppScale we would add tables per data-type and use the Task Queues API to frame the numerical models. Servlets deployed to AppScale as per GAE to provide front end.

Question

Should we use AppScale or Hazelcast for requirements like this? That is - for the problem as stated, are there any stand-out factors for/against either platform that we should consider?


Solution

  • If you prefer/require a distributed, service-oriented programming model (bag of tasks) then the answer is AppScale. If you prefer/require a parallel programming model (single machine abstraction) then the answer is Hazelcast. AppScale is also a complete cloud platform (vs only a datastore) which enables you to do more things with your app as it evolves. If you go with AppScale, you can adjust the timing restriction on the tasks and customize the platform with the libraries you want to use, for your computationally expensive methods.