Search code examples
mysqlcouchdbnosqldocument-oriented-dbschemaless

Does this schema sound better suited for a document-oriented data store or relational?


Disclaimer: let me know if this question is better suited for serverfault.com


I want to store information on music, specifically:

  • genres
  • artists
  • albums
  • songs

This information will be used in a web application, and I want people to be able to see all of the songs associated to an album, and albums associated to an artist, and artists associated to a genre.

I'm currently using MySQL, but before I make a decision to switch I want to know:

  1. How easy is scaling horizontally?
  2. Is it easier to manage than an SQL based solution?
  3. Would the above data I want to store be too hard to do schema-free?
  4. When I think association, I immediately think RDBMSs; can data be stored in something like CouchDB but still have some kind of association as stated above?
  5. My web application requires replication, how well does CouchDB or others handle this?

Solution

  • This kind of information is ideally suited to document databases. As with much real-world data, it is not inherently relational, so shoe-horning it into a relational schema will bring headaches down the line (even using an ORM - I speak from experience). Ubuntu already uses CouchDB for storing music metadata, as well as other things, in their One product.

    Taking the remainder of your questions one-by-one:

    1. Horizontal scaling is WAY easier than with RDBMS. This is one of the many reasons big sites like Facebook, Digg and LinkedIn are using, or are actively investigating, schema-less databases. For example, sharding (dividing your data across different nodes in a system) works beautifully thanks to a concept called Eventual Consistency; i.e., the data may be inconsistent across nodes for a while, but it will eventually resolve to a consistent state.
    2. It depends what you mean by "manage"... Installation is generally quick and easy to complete. There are no user accounts to configure and secure (this is instead generally done in the application's business logic layer). Working with a document DB in real time can be interesting: there's no ad hoc querying in CouchDB, for example; you have to use the Futon UI or communicate with it via HTTP requests. MongoDB, however, does support ad hoc querying.
    3. I shouldn't think so. Bastien's answer provides a good example of a JSON document serialising some data. The beauty of schemaless DBs is that fields can be missing from one document and present in another, or the documents can be completely different from one another. This removes many of the problems involved with RDBMS' null value, which are many and varied.
    4. Yes; the associations are stored as nested documents, which are parsed in your application as object references, collections, etc. In Bastien's answer, the "songs" key identifies an array of song documents.
    5. This is very similar to your first question about horizontal scaling (horizontal scaling and replication are intertwined). As the CouchIO blog post Bastien mentioned states, "Replication … has been baked into CouchDB from the beginning.". My understanding is that all document databases handle replication well, and do so more easily than it is to set it up in an RDBMS.

    Were you to decide you wanted to store the song file itself along with the metadata, you could do that too in CouchDB, by supplying the song file as an attachment to the document; further more, you wouldn't have any schema inconsistencies as a result of doing this, because there is no schema!

    I hope I haven't made too many missteps here; I'm quite new to document DBs myself.