Kind of a "not_eq_join", if that makes sense.
A have a table A
with documents containing a _key
field (it's the primary key), and another table B
that has a refs
field that points to A._key
, this is btw a multi index so it may contain multiple references to documents in A
. A is quite large, however B is not (probably less than 1000)
Entry in A
:
{
'_key': 'doc1',
...
}
Entry in B
:
{
'refs': ['doc1'],
...
}
The question is; how do I find documents in A
that is not referenced by any document in B
?
My naive solution would be to retrieve all unique values from the B.refs
multi index (but how to do that?) to an array variable, and then do something like: .filter(refArray.contains(R.row['_key']).not_())
on the A
selection (filtered by other criteria as well).
But this seems wrong, any better ideas?
Of course, I'd like to do this "as quickly as possible" :)
Assuming that the number of documents referenced by B
is small, your solution is probably the best one.
You can get the distinct values in an index (including multi-indexes) by writing r.table('B').distinct({index: 'refs'})
.
If the the number of documents referenced by B
is large compared to the size of A
, this gets a lot harder to do efficiently.