Search code examples
pythonrethinkdbrethinkdb-python

Find documents not referenced by another table


Kind of a "not_eq_join", if that makes sense.

A have a table A with documents containing a _key field (it's the primary key), and another table B that has a refs field that points to A._key, this is btw a multi index so it may contain multiple references to documents in A. A is quite large, however B is not (probably less than 1000)

Entry in A:

{
  '_key': 'doc1',
  ...
}

Entry in B:

{
  'refs': ['doc1'],
  ...
}

The question is; how do I find documents in A that is not referenced by any document in B ?

My naive solution would be to retrieve all unique values from the B.refs multi index (but how to do that?) to an array variable, and then do something like: .filter(refArray.contains(R.row['_key']).not_()) on the A selection (filtered by other criteria as well).

But this seems wrong, any better ideas?

Of course, I'd like to do this "as quickly as possible" :)


Solution

  • Assuming that the number of documents referenced by B is small, your solution is probably the best one.

    You can get the distinct values in an index (including multi-indexes) by writing r.table('B').distinct({index: 'refs'}).

    If the the number of documents referenced by B is large compared to the size of A, this gets a lot harder to do efficiently.