Search code examples
pythonrethinkdbreql

rethinkdb: get rows created no later than two weeks from now with unique id


I have a table with some rows that look like this:

{
  "id": "12345"
  "created_date": Fri May 27 2016 22:06:25 GMT+00:00 ,
} {
  "id": "6789"
  "created_date": Mon May 30 2016 07:48:35 GMT+00:00 ,
}

etc...
  1. I am attempting to first filter the rows down by only getting ones created no later than 2 weeks ago from today.

  2. Then I am attempting to filter down by getting ONLY unique ids (no duplicates) but still the latest of there id.

This filtering can be done in reverse, if that is more efficient.

Something along the lines of this Pseudocode:

r.db().table().filter( r.time(r.row['created_date']) > r.now()-2_weeks).filter(latest-uniques-only)

Solution

  • Here are two options for achieving the desired results.

    Using a filter and group

    You could use the following query to perform this lookup (the code is using Javascript API and tested on a table through the Data Explorer):

    r.table('the_table')
    .group('the_id') // group by the id
      .filter((row) => {
        return row('created_date').gt(r.now().sub(86400*14)); // only include records from the last 14 days
      })
      .orderBy(r.desc('created_date')) // order by latest first
      .pluck('the_id', 'created_date') // just include id and date in results
    
    .ungroup() // stop performing actions on the group
    .map((row) => row('reduction').nth(0)); // map the results only retrieving the first row of each group, the latest for each id
    

    Provided that the_id field is not unique, then this query will return the latest record for each the_id that was created in the last 2 weeks.

    Use a secondary index

    To make the above query more efficient/performant you could put an index on created_date and then use the between() command to find all created dates within the last 2 weeks.

    First create an index on the date field:

    r.table('the_table').indexCreate('created_date');
    

    Then you can use this to more efficiently filter your table.

    r.table('the_table')
     .between(r.now().sub(86400*84), r.now(), { index: 'created_date'})
     .group('the_id')
      .orderBy(r.desc('created_date'))
      .pluck('the_id', 'created_date')
     .ungroup()
     .map((row) => row('reduction').nth(0))