Search code examples
javascriptnode.jsmongodbaggregation-frameworkmongoskin

Mongodb: how to return elements of array that are present in the query list


I have a collection called shops. Structure is like:

[
     {
          '_id' : id1,
          'details' : {name: 'shopA'},
          'products' : [{
               _id: 'p1',
               details:  {
                    'name': 'product1'
               }
          },{
               _id: 'p2',
               details:  {
                    'name': 'product2'
               }
          }, {
               _id: 'p4',
               details:  {
                    'name': 'product4'
               }
          }
     },{
          '_id' : id2,
          'details' : {name: 'shopB'},
          'products' : [{
               _id: 'p1',
               details:  {
                    'name': 'product1'
               }
          },{
               _id: 'p4',
               details:  {
                    'name': 'product4'
               }
          }, {
               _id: 'p5',
               details:  {
                    'name': 'product5'
               }
          }
     },{
          '_id' : id3,
          'details' : {name: 'shopC'},
          'products' : [{
               _id: 'p1',
               details:  {
                    'name': 'product1'
               }
          },{
               _id: 'p2',
               details:  {
                    'name': 'product2'
               }
          }, {
               _id: 'p3',
               details:  {
                    'name': 'product3'
               }
          }
     },{
          '_id' : id4,
          'details' : {name: 'shopOther'},
          'products' : [{
               _id: 'p10',
               details:  {
                    'name': 'product10'
               }
          },{
               _id: 'p12',
               details:  {
                    'name': 'product12'
               }
          }, {
               _id: 'p13',
               details:  {
                    'name': 'product13'
               }
          }
     }
]

Now user can select some of the products from menu and try to find the shops for those. The result should be all the shops which provide atleast one of the selected items.

Example,

Suppose users select ['p1', 'p2', 'p3'] //ids Then only three shops id1, id2, id3 will be listed(id4 has none of these items), plus the structure is such that it removes rest of the products of a shop(which were not listed) from the document in the results array.

Is there a way, I can get such result from mongodb directly?


Solution

  • Since you did ask nicely and also so well formed then there is some consideration that similar answers may not actually suit for reference, especially if your experience level with the MongoDB product is low.

    Options like $redact may seem simple, and they are often well suited. But this is not a case for how you would need to construct the statement:

    db.collection.aggregate([
      { "$match": { "products._id": { "$in": ["p1","p2","p3"] } }},
      { "$redact": {
        "$cond": {
          "if": {
            "$or": [
              { "$eq": [ "$_id", "p1" ] },
              { "$eq": [ "$_id", "p2" ] },
              { "$eq": [ "$_id", "p3" ] }
            ]
          },
          "then": "$$DESCEND",
          "else": "$$PRUNE"
        }
      }}
    ])
    

    That works with the "not so obvious" use of $or in an aggregation operator. At least in terms of the correct syntax and form, but it is actually a "complete fail". The reasoning is that because $redact is generally a "recursive" operation, and it inspects at "all levels" of the document and not just at a specific level. So of your in the "top level" the _id assertion will fail as that top level field of the same name is not going to match that condition.

    There really isn't anything else you can really do about this, but considering that _id in the array is actually a "unique" element then you can always perform this operation in an $project stage with the help of $map and $setDifference:

    db.collection.aggregate([
      { "$match": { "products._id": { "$in": ["p1","p2","p3"] } }},
      { "$project": {
        "details": 1,
        "products": {
          "$setDifference": [
            { "$map": {
              "input": "$products",
              "as": "el",
              "in": {
                "$cond": {
                  "if": { 
                    "$or": [
                      { "$eq": [ "$$el._id", "p1" ] },
                      { "$eq": [ "$$el._id", "p2" ] },
                      { "$eq": [ "$$el._id", "p3" ] }
                    ]
                  },
                  "then": "$$el",
                  "else": false
                }
              }
            }},
            [false]
          ]
        }
      }}
    ])
    

    It seems lengthy, but it actually very efficient. The $map operator processes arrays "inline" for each document and acting on each element to produce a new array. The false assertion made under $cond where the condtions are not a match is balanced by considering the "set" of results in comparison to $setDifference, which effectively "filters" the false results from the resulting array, leaving only the valid matches behind.

    Of course where the _id values or entire objects were not truly "unique" then a "set" would no longer be valid. With this consideration, as well as the truth that the mentioned operators are not available to versions of MongoDB prior to 2.6, then the more tradtional approach is to $unwind the array members and then "filter" them via a $match operation.

    db.collection.aggregate([
      { "$match": { "products._id": { "$in": ["p1","p2","p3"] } }},
      { "$unwind": "$products" },
      { "$match": { "products._id": { "$in": ["p1","p2","p3"] } }},
      { "$group": {    
          "_id": "$_id",
          "details": { "$first": "$details" },
          "products": { "$push": "$products" }
      }}
    ])
    

    Consideration is given that as per the other examples, the $match phase should be executed first in the pipeline in order to reduce the "possible" documents matching the condition. The "second" phase with $match does the actuall "filtering" of the document elements inside the array when in the "de-normalized" form.

    Since the array was "deconstructed" by $unwind, the purpose of $group is to "re-build" the array, "filtered" from the elements that do not match the condition.

    MongoDB also offers the positional $ operator in order to select matched array elements from a query condition. Like so:

    db.collection.find(
        { "products._id": { "$in": ["p1","p2","p3"] },
        { "details": 1, "products.$": 1 }
    )
    

    But the problem here is that this operator only supports the "first" match on the conditions supplied in the query document. This is a design intent, and as yet there is no strict operator syntax to cater for more than a single match.

    So your ultimate approach is currently to use the .aggregate() method in order to actually achieve the match filtering on inner arrays that you desire. Either that or filter the contents responded yourself in client code, depending on how palatable that ultimately is to you.