I am doing a project which deals with large amount of data. I am thinking to host that project on Ec2. I intend to use Hadoop to do the computing and some NoSql system (e.g. Hbase/Cassandra) to store the data. The NoSql system must be persistent (I don't want to lose my data). As far as I know, I need to spawn VMs to host Hadoop and NoSql stuffs. But the VMs are not persistent. Are there any other ways that I can host the data storage system persistently (not only the data, but the system which manages the data) and make use of the computation Amazon provides?
I guess my scenario is similar to people who host their databases persistently.
I think you need to look at using "Reserved Instances" and "Elastic Block Store"(EBS).
http://aws.amazon.com/ec2/reserved-instances/
If I understand your question correctly, you would want a reserved instance that you always leave running attached to an EBS volume for persistent storage of your data. EBS is able to make backup "snapshops" to S3 as well.