Search code examples
indexingrangerethinkdbreql

Rethinkdb between with multiple keys


I've the following structrue :

Item {
  id string,
  title string
  tags []string,
  time int,
  parent string
}

What i want, list all items with tags [tag1, tag2, ... etc] of parent "parent-1" and order by time

So i did this

r.db("db").table("tb").indexCreate("allByTime", function(row){
  return row("tags").map(function(tag){
    return [row("parent"), tag, row("time")]
  })
})

It worked with a query like this

r.db("db").table("tb").between(["parent-1", "tag1", 0], ["parent-1", "tag1", <some-bigger-timestamp>], {index: "allByTime"}).orderBy(...)

But also i want something like this

r.db("db").table("tb").between(["parent-1", ["tag1", "tag2"], 0], ["parent-1", ["tag1", "tag2"], <some-bigger-timestamp>], {index: "allByTime"}).orderBy(...)

Any suggestions ?

NOTE -> i don't want to use r.filter(...)

I've tried something like this

r.union(<between-query-1>, <between-query-2>, ...)

but i don't know what the overhead will be in large table with many between queries .


Solution

  • Honestly, I don't think what you want is really possible. If you think about it, what you want is a compound index with a multi index inside of it and I don't think there's any way to express that in RethinkDB.

    In this query:

    r.db("db").table("tb")
     .between(
        ["parent-1", ["tag1", "tag2"], 0], 
        ["parent-1", ["tag1", "tag2"], <some-bigger-timestamp>], 
        {index: "allByTime"}
      )
      .orderBy(...)
    

    It seems that what you want here is: all documents with a parent of 'parent1' AND with 'tag1' AND 'tag2' with a timestamp BETWEEN 0 and <some-bigger-timestamp>. It seems that, if that is the case, then union wouldn't really work, because you can't query fields by multiple values.

    My Proposed Solution: Just use filter!

    In the query you had before:

    r.db("db").table("tb")
     .between(
       ["parent-1", "tag1", 0], 
       ["parent-1", "tag1", <some-bigger-timestamp>], 
       {index: "allByTime"}
     )
     .orderBy(...)
    

    You are probably narrowing down your data to a portion of your data in which you CAN use filter. I think the point of filter is that you should never use it. The point of filter is to use it smartly when (using a large dataset), you've already used indexes smartly (which you have). If the result of that between query is more than a couple of thousand, then you might have problems, but if it's not, I wouldn't worry about it. If it is, then maybe you can update your question with more about your data (how many different parents, tags, timestamps present) and what make a single document unique.