It's been a long time since I haven't created a project from scratch, and now document-oriented databases (as well as ODM) have become quite popular, so I have to consider them before blindly going the relational route.
Could anyone try to list the motivations / project criteria that could lead to one choice or another?
ORM / relational database / SQL
Pros:
- well-understood, standard approach
- maps well to data with consistent structure
- maps well to data with multiple relationships between multiple entities
- has extensive join capabilities
- has transactions
- scalable to huge number of transactions per second (using MySQL Cluster, Fusion-IO, etc)
Cons:
- hard to scale to huge volume of data, if performance is also a concern
- does not map well to data with variable structure (or semi-structured)
- persisting objects requires a glue/translation layer which can be a performance bottleneck (and if done wrong also can be very verbose)
ODM / document database / NoSQL
Pros:
- scalable to huge volume of data, and to huge numbers of relatively independent queries
- high availability, sharding, multi-master, ...
- maps well to semi-structured data
- maps well to data with more variable structure
- data model can be more flexible
- queries do not have to be translated to SQL (the native NoSQL query style may or may not be better suited for some uses, and no overhead from SQL driver/parsing/etc)
- (for object databases) maps directly to objects, no need for object-relational translation
Cons:
- often, no join (or limited version of join)
- often, no transactions (or a limited version of transactional consistency/atomicity)
How to decide
Based on the type of data and usage patterns:
- Does the data have a uniform structure? (relational) ... or a variable/inconsistent structure? (document)
- Does typical usage read/write a single type of entity? (document) ... or a view which is composed of properties of multiple entities? (relational)
- Are transactions required? (relational) ... or no transactions needed? (document)
Based on the scaling/performance requirements:
- Huge data + few, slow, complex read/writes? (data warehousing type scenario) => relational
- Huge data + huge volume of simple read/writes? (craigslist backend type scenario) => document
- Huge data + fast, complex reads/writes? => this is difficult; either use relational and try to scale it up, or use document and try to simplify queries
- Moderate data + fast transactional writes? (banking type scenario) => relational
- Moderate data + moderate read/write? => pick any based on vendor/tool support, familiarity, etc
References
(Background: I haven't done anything in this area very recently, but a few years ago I built a large system which used replicated MySQL + Sphinx, ie a relational and document hybrid)