Search code examples
ruby-on-railsmongodbperformancemongoid

query efficiency on single embedded document in mongodb / rails


My question is about efficiency and perhaps good practices.

I am envisioning the following data model in rails with mongoid:

class Subscription
    ... some attributes...
  embeds_one :schedule   # Note, there will be only one embedded document!
end

class Schedule
  field :scheduleType, type: String
  field :dayOfMonth, type: Integer
  field :hour, type: Integer
  field :dayOfWeek, type: String
  embedded in :subscription
end

I mainly want to find the subscription by a relatively simple query on the embedded schedule, e.g.

Subscription.where(scheduleType: 'month').and(dayOfMonth: 21).and(hour: 12).selector

I wonder if this is efficient. Basically:

  • is there a performance penalty for querying on an embedded document?
  • does it make a difference to index the fields I am searching on?
  • would it be more efficient to simply make the fields a part of the parent Subscription object. Like so:
class Subscription
  ... some attributes...
  field :scheduleType, type: String
  field :dayOfMonth, type: Integer
  field :hour, type: Integer
  field :dayOfWeek, type: String
end

I could do that but thought it was more elegant to embed since I already have the Schedule object.

I am curious to hear the expert opinions on this.


Solution

  • No performance impact and no difference between top level field and subdocument field on database side. It's a single document the database need to search/fetch regardless of structure - the fields of the embedded document are translated to a dot-notation name on database side.

    Just don't confuse it with other associations, which may have dramatic impact. The statement above applies to embeds_one exclusively.

    The same thing for indexing - structure of the document doesn't matter in this case. Index on a nested field have the same effect as on the top level field. It can dramatically reduce time to search, have very little impact, or in some edge cases can slow down the whole cluster. It all depends on how large your collection is and cardinality of indexed data.

    For the query

    Subscription.where(scheduleType: 'month').and(dayOfMonth: 21).and(hour: 12).selector
    

    a compound index on scheduleType, dayOfMonth, and hour will work the best.

    For other queries the index will be different.

    An index on scheduleType alone won't help much, assuming you have only few types, so the index won't be selective enough to give meaningful advantage, especially on smaller collections.

    The tricky part about indices in mongo is that it doesn't use more than 1 index per query. I mean in theory there is index intersection, but in practice it never happens.