Search code examples
mongodbmongodb-queryaggregation-frameworkpymongo

Return documents depending on matching a given search set with accumulated distinct arrays from a document field


I have set with strings like this: a,b

and a collection like this:

{
    "owner": "anna",
    "letters": ["c"]
},

{
    "owner": "bob",
    "letters": ["b", "c"]
},

{
    "owner": "cai",
    "letters": ["a", "b"]
},
{
    "owner": "dora",
    "letters": ["a", "d"]
},
{
    "owner": "emil",
    "letters": ["a"]
},
{
    "owner": "fry",
    "letters": ["b"]
},

I want to get a random collection of documents where the set a,b matches a subset of accumulated distinct sets of the field "letters" of all documents in the collection.

A valid solution would be:

{
"owner": "cai",
"letters": ["a", "b"]
}

Also valid:

{
"owner": "emil",
"letters": ["a"]
},
{
"owner": "fry",
"letters": ["b"]
},

Also valid, since the sets of "letters" are distinct:


{
"owner": "emil",
"letters": ["a"]
},
{
"owner": "bob",
"letters": ["b", "c"]
},

Also valid:

{
"owner": "bob",
"letters": ["b", "c"]
},
{
"owner": "dora",
"letters": ["a", "d"]
},

NOT valid would be the following, since "b" is in both documents (should only be in one):

{
"owner": "bob",
"letters": ["b", "c"]
},

{
"owner": "cai",
"letters": ["a", "b"]
}

Also NOT valid, since "a" is in both documents:


{
"owner": "emil",
"letters": ["a"]
},

{
"owner": "cai",
"letters": ["a", "b"]
},

With the aggregation pipeline, in tried to group the documents by the field "letters" and randomize the order of the documents. I am stuck at how to apply the set rules to only return distinct documents without intersections between the arrays of field "letters" to match the search set.

Thanks for your help!


Solution

  • Use a sub-pipeline inside $lookup to search by a combination where the set union contains your criteria array (i.e. ["a", "b"]). Use $sample at the end to pick 1 combination. The combination would be stored in the $lookup result and you can just pick the combination you want.

    db.collection.aggregate([
      {
        "$lookup": {
          "from": "collection",
          "let": {
            l: "$letters"
          },
          "pipeline": [
            {
              "$match": {
                $expr: {
                  $eq: [
                    {
                      "$setIntersection": [
                        "$$l",
                        "$letters"
                      ]
                    },
                    []
                  ]
                }
              }
            },
            {
              "$match": {
                $expr: {
                  "$setIsSubset": [
                    [
                      "a",
                      "b"
                    ],
                    {
                      "$setUnion": [
                        "$$l",
                        "$letters"
                      ]
                    }
                  ]
                }
              }
            },
            {
              // optional for performance boost
              $limit: 10
            }
          ],
          "as": "candidates"
        }
      },
      {
        $unwind: "$candidates"
      },
      {
        $sample: {
          size: 1
        }
      }
    ])
    

    Mongo Playground