Search code examples
pythonmongodbpymongonested-fields

Finding document containing array of nested names in pymongo (CrossRef data)


I have a dataset of CrossRef works records stored in a collection called works in MongoDB and I am using a Python application to query this database.

I am trying to find documents based on one author's name. Removing extraneous details, a document might look like this:

{'DOI':'some-doi',
'author':[{'given': 'Albert','family':'Einstein',affiliation:[]},
{'given':'R.','family':'Feynman',affiliation:[]},
{'given':'Paul','family':'Dirac',affiliation:['University of Florida']}]
}

It isn't clear to me how to combine the queries to get just Albert Einstein's papers.

I have indexes on author.family and author.given, I've tried:

cur = works.find({'author.family':'Einstein','author.given':'Albert'})

This returns all of the documents by people called 'Albert' and all of those by people called 'Einstein'. I can filter this manually, but it's obviously less than ideal.

I also tried:

cur = works.find({'author':{'given':'Albert','family':'Einstein','affiliation':[]}})

But this returns nothing (after a very long delay). I've tried this with and without 'affiliation'. There are a few questions on SO about querying nested fields, but none seem to concern the case where we're looking for 2 specific things in 1 nested field.


Solution

  • Your issue is that author is a list.

    You can use an aggregate query to unwind this list to objects, and then your query would work:

    cur = works.aggregate([{'$unwind': '$author'},
                           {'$match': {'author.family':'Einstein', 'author.given':'Albert'}}])
    

    Alternatively, use $elemMatch which matches on arrays that match all the elements specified.

    cur = works.find({"author": {'$elemMatch': {'family': 'Einstein', 'given': 'Albert'}}})
    

    Also consider using multikey indexes.