Search code examples
data-modelingravendbdocument-databasedocument-based-database

How would I model data that is heirarchal and relational in a document-oriented database system like RavenDB?


Document oriented databases (particularly RavenDB) are really intriguing me, and I'm wanting to play around with them a bit. However as someone who is very used to relational mapping, I was trying to think of how to model data correctly in a document database.

Say I have a CRM with the following entities in my C# application (leaving out unneeded properties):

public class Company
{
    public int Id { get; set; }
    public IList<Contact> Contacts { get; set; }
    public IList<Task> Tasks { get; set; }
}

public class Contact
{
    public int Id { get; set; }
    public Company Company { get; set; }
    public IList<Task> Tasks { get; set; }
}

public class Task
{
    public int Id { get; set; }
    public Company Company { get; set; }
    public Contact Contact { get; set; }
}

I was thinking of putting this all in a Company document, as contacts and tasks do not have a purpose out side of companies, and most of the time query for a task or contacts will also show information about the associated company.

The issue comes with Task entities. Say the business requires that a task is ALWAYS associated with a company but optionally also associated with a task.

In a relational model this is easy, as you just have a Tasks table and have the Company.Tasks relate to all tasks for the company, while Contact.Tasks only show the tasks for the specific Task.

For modeling this in a document database, I thought of the following three ideas:

  1. Model Tasks as a separate document. This seems kind of anti-document db as most of the time you look at a company or contact you will want to see the list of tasks, thus having to perform joins over documents a lot.

  2. Keep tasks that are not associated with a contact in the Company.Tasks list and put tasks assocaited with a contact in the list for each individual contacts. This unfortunately means that if you want to see all tasks for a company (which will probably be a lot) you have to combine all tasks for the company with all tasks for each individual contact. I also see this being complicated when you want to disassociate a task from a contact, as you have to move it from the contact to the company

  3. Keep all tasks in the Company.Tasks list, and each contact has a list of id values for tasks it is associated with. This seems like a good approach except for having to manually take id values and having to make a sub-list of Task entities for a contact.

What is the recommended way to model this data in a document oriented database?


Solution

  • Use denormalized references:

    http://ravendb.net/faq/denormalized-references

    in essence you have a DenormalizedReference class:

    public class DenormalizedReference<T> where T : INamedDocument
    {
        public string Id { get; set; }
        public string Name { get; set; }
    
        public static implicit operator DenormalizedReference<T> (T doc)
        {
            return new DenormalizedReference<T>
            {
                Id = doc.Id,
                Name = doc.Name
            }
        }
    }
    

    your documents look like - i've implemented the INamedDocument interface - this can be whatever you need it to be though:

    public class Company : INamedDocument
    {
        public string Name{get;set;}
        public int Id { get; set; }
        public IList<DenormalizedReference<Contact>> Contacts { get; set; }
        public IList<DenormalizedReference<Task>> Tasks { get; set; }
    }
    
    public class Contact : INamedDocument
    {
        public string Name{get;set;}
        public int Id { get; set; }
        public DenormalizedReference<Company> Company { get; set; }
        public IList<DenormalizedReference<Task>> Tasks { get; set; }
    }
    
    public class Task : INamedDocument
    {
        public string Name{get;set;}
        public int Id { get; set; }
        public DenormalizedReference<Company> Company { get; set; }
        public DenormalizedReference<Contact> Contact { get; set; }
    }
    

    Now saving a Task works exactly as it did before:

    var task = new Task{
        Company = myCompany,
        Contact = myContact
    };
    

    However pulling all this back will mean you're only going to get the denormalized reference for the child objects. To hydrate these I use an index:

    public class Tasks_Hydrated : AbstractIndexCreationTask<Task>
    {
        public Tasks_Hydrated()
        {
            Map = docs => from doc in docs
                          select new
                                     {
                                         doc.Name
                                     };
    
            TransformResults = (db, docs) => from doc in docs
                                             let Company = db.Load<Company>(doc.Company.Id)
                                             let Contact = db.Load<Contact>(doc.Contact.Id)
                                             select new
                                                        {
                                                            Contact,
                                                            Company,
                                                            doc.Id,
                                                            doc.Name
                                                        };
        }
    }
    

    And using your index to retrieve the hydrated tasks is:

    var tasks = from c in _session.Query<Projections.Task, Tasks_Hydrated>()
                        where c.Name == "taskmaster"
                        select c;
    

    Which i think is quite clean :)

    As a design conversation - the general rule is that if you ever need to load the child documents alone as in - not part of the parent document. Whether that be for editing or viewing - you should model it with it's own Id as it's own document. Using the method above makes this quite simple.