Search code examples
c#indexingnosqlravendbravendb4

RavenDb index to filter&sort on properties of the nested structure/collection (fanout index)


I'm looking for a way to create a static index to serve filtering/sorting queries for a combination of property values in the nested structure (a collection of objects) along with the structure-container. It seems to be not trivial due to the following reasons:

  • If properties of the nested structure/collection are separated out into individual fields of the index (individual collections), then it makes impossible using the AND condition when filtering on 2+ properties of the nested structure/collection.
  • The fanout index complexity (see an example), which makes any solution to run too slow.

Given the following persistent model:

public class Document
{
    public string Title { get; set; }

    public List<UserChange> RecentModifications { get; set; }
}

where

public class UserChange
{
    public string UserId { get; set; }
    public DateTime Timestamp { get; set; }
}

Question: How to build the index for Document to filter/sort by a combination of all the fields: Title, UserId and Timestamp?

Possible use cases:

  • get all documents containing the word 'contract' for a certain user and a date range
  • sort documents containing the word 'contract' by last modification made by the user.

P.S. I understand that the indexing restrictions can by bypassed by restructuring the persistence model - storing a structure for recently modified documents in the User document, but it'd impose some other limitations, which I'd like to avoid.


Solution

  • The problem can be solved by using Indexes with Dynamic Fields. It allows to keep logical data structure and avoids creating a fanout index.

    Solution

    Create the following index for the Document collection from the above:

    public class MyIndex : AbstractIndexCreationTask<Document, DocumentIndDto>
    {
        public MyIndex()
        {
            // Add fields that are used for filtering and sorting
            Map = docs =>
                from e in docs
                select new
                {
                    Title = e.Title, 
                    _ = e.RecentModifications.Select( x => CreateField ($"{nameof(Document.RecentModifications)}_{x.UserId}", x.Timestamp))
                };
        }
    }
    
    public class DocumentIndDto
    {
        public string Title { get; set; }
        public Dictionary<string,DateTime> RecentModifications { get; set; }
    }
    

    Query on MyIndex like

    var q = s.Query<DocumentIndDto, MyIndex>()
                    .Where(p => p.Title == "Super" && p. RecentModifications["User1"] < DateTime.Now);
    

    Explanation

    The specified index with dynamic fields will generate extra fields and terms for each record in the following format:

    RecentModifications_User1 = '2018-07-01';
    RecentModifications_User2 = '2018-07-02';
    

    The format is important, because when you use a dictionary in your high-level query like myDic[key], it gets transformed to myDic_key in the generated RQL. Hence, it will allow us to use these fields in queries.

    If you query with using the usual Query rather than DocumentQuery (see docs), then you need a proper data type for LINQ to work. For that purpose I created DocumentIndDto class, where my RecentModifications has become a dictionary, so I could use it in a high-level query and get correct RQL like

    from index 'MyIndex' where Title = $p0 and RecentModifications_User1 = $p1
    

    For more details, see my discussion on the subject with Oren Eini (aka Ayende Rahien).