Search code examples
rethinkdbrethinkdb-javascript

RethinkDb OrderBy Before Filter, Performance


The data table is the biggest table in my db. I would like to query the db and then order it by the entries timestamps. Common sense would be to filter first and then manipulate the data.

queryA = r.table('data').filter(filter).filter(r.row('timestamp').minutes().lt(5)).orderBy('timestamp')

But this is not possible, because the filter creates a side table. And the command would throw an error (https://github.com/rethinkdb/rethinkdb/issues/4656).

So I was wondering if I put the orderBy first if this would crash the perfomance when the datatbse gets huge over time.

queryB = r.table('data').orderBy('timestamp').filter(filter).filter(r.row('timestamp').minutes().lt(5))

Currently I order it after querying, but usually datatbases are quicker in these processes.

queryA.run (err, entries)->
  ...
  entries = _.sortBy(entries, 'timestamp').reverse() #this process takes on my local machine ~2000ms

Question:

What is the best approach (performance wise) to query this entries ordered by timestamp.

Edit: The db is run with one shard.


Solution

  • Using an index is often the best way to improve performance.

    For example, an index on the timestamp field can be created:

    r.table('data').indexCreate('timestamp')
    

    It can be used to sort documents:

    r.table('data').orderBy({index: 'timestamp'})
    

    Or to select a given range, for example the past hour:

    r.table('data').between(r.now().sub(60*60), r.now(), {index: 'timestamp'})
    

    The last two operations can be combined int one:

    r.table('data').between(r.now().sub(60*60), r.maxval, {index: 'timestamp'}).orderBy({index: 'timestamp'})
    

    Additional filters can also be added. A filter should always be placed after an indexed operation:

    r.table('data').orderBy({index: 'timestamp'}).filter({colour: 'red'})
    

    This restriction on filters is only for indexed operations. A regular orderBy can be placed after a filter:

    r.table('data').filter({colour: 'red'}).orderBy('timestamp')
    

    For more information, see the RethinkDB documentation: https://www.rethinkdb.com/docs/secondary-indexes/python/