Search code examples
mapreducerethinkdbrethinkdb-python

Rethinkdb mapreduce not has_fields not working properly


I'm trying to find the percent of records (grouped by company) that do not have phone numbers. I can do this with the following two queries:

r.table('users') \
 .merge(lambda u: {'groups': r.table('groups').get_all(r.args(u['group_ids'])).coerce_to('array')}) \
 .filter(lambda u: u.has_fields('phone')) \
 .group(lambda u: u['groups'][0]['company']).count().run()

and to get the count of all records:

r.table('users') \
 .merge(lambda u: {'groups': r.table('groups').get_all(r.args(u['group_ids'])).coerce_to('array')}) \
 .group(lambda u: u['groups'][0]['company']).count().run()

However, I'd like to use map-reduce to do this in a single query and possibly be more efficient. Here is my query, but it doesn't work because both of the numbers (phone and count) are the same:

r.table('users') \
 .merge(lambda u: {'groups': r.table('groups').get_all(r.args(u['group_ids'])).coerce_to('array')}) \
 .group(lambda u: u['groups'][0]['company']) \
 .map(lambda u: { 'phone': 1 if u.has_fields('phone') else 0, 'count': 1 }) \
 .reduce(lambda a, b: {'phone': a['phone'] + b['phone'], 'count': a['count'] + b['count'] }).run()

So my question is, why doesn't has_fields() work in the map command, but does in the filter command?


Solution

  • The problem is that you're using Python's if/then/else operators. Python doesn't expose a way to interact with these, so the driver can't see the whole if/then/else statement. If you use r.branch instead (r.branch(u.has_fields('phone'), 1, 0)) it should work.