Search code examples
sql-serveramazon-web-servicesamazon-ec2rds

What technology would you suggest to manage hundreds of millions of strings


I am looking at a project that involves managing hundreds of millions of strings (varchar(100)) and some related data (a few ints, some floats, etc.). The project owner has mandated that this be performed in the Amazon cloud.

These strings will be updated continuously (about 1% per day) via batch jobs. There will be several key reports that will run daily against this data.

As a SQL Server DBA my first instinct is to put it into an EC2 instance running SQL Server, however I would like to be able to present other option (RDS, MySQL, non-RDBMS, etc.)

Any suggestions? What has worked for you in the past?


Solution

  • In Amazon Cloud you have the following options:

    • DynamoDB - KeyValue NoSQL highly scalable Database. You can store almost infinite number of keys with it. The two use cases for it are to do key lookup, and key+range scan. To have a more complex query against it for reports it not recommended.

    • Redshift - Data warehouse in huge scale that is working against standard SQL clients (PostgreSQL drivers). It is also can support almost unlimited number of rows in a columnar structure. You can run all your favorite reports on top of it.

    • CloudSearch - Very scalable search indexer with the ability to use the attributes for creation of facets (how many items in each range). Depends on your data structure and query types, it might be useful.

    • RDS - Relational Database Service, which support MySQL, Oracle and MS-SQL engines. This is similar to running these databases on standard EC2 instances, but with some DBA tasks (Backup, Restore, Scaling...) done easier.

    • EC2 - put your favorite NoSQL (MongoDB, Redis, Couchbase...) or RDBMS (PostgreSQL, MySQL...) on any instance types, including the beefy machines with tons of memory and disks.