Search code examples
numpyhadoopamazon-ec2machine-learningmahout

Amazon EC2 vs PiCloud


We are students trying to handling data size of about 140 million records and trying to run few machine learning algorithms. we are newbie to the entire cloud solutions and mahout implementations.Currently we have set them up in postgresql database but the current implementation doesn't scale up and read/write operations seems to be extremely slow after numerous performance tuning.Hence we are planning to go for cloud based services.

We have explored a few possible alternatives.

  1. Amazon cloud based services( Mahout implementation)
  2. Picloud with scikits learn (we were planning to use HDF5 format with NumPy)
  3. Please recommend any other alternatives if any.

Here are the following questions

  1. Which would yield us better results(turn around time) and would be cost effective? Please do mention us any other alternatives present.
  2. In case if we set up amazon services how should we have the data format? If we use dynamodb will the cost shoot up?

Thanks


Solution

  • PiCloud is built on top of AWS, so either way you're going to be using Amazon at the end of the day. The question is how much infrastructure you'll have to write yourself to get everything wired together. PiCloud gives some free usage to put it through the paces so you might give it shot initially. I haven't used it myself but clearly they're trying to provide ease of deployment for machine-learning type applications.

    It seems like this is trying for results, not to be a cloud project, so I would either look into using one of Amazon's other services besides straight EC2 or otherwise some other software like PiCloud or Heroku or other service that can take care of the bootstrapping.