Search code examples
mongodbexceptset-theoryset-difference

MongoDb Except equivalent


I have a question about a problem I came across while trying to use $setDifference on a collection of documents.

All I want to have are all documents that are contained in Root 1 and remove all documents that are also included in Root 2 based on the "reference.id".

My collection represents two tree structures and basically looks like this:

/* Tree Root 1 */
{
    "_id" : LUUID("9f3a73df-bca7-48b7-b111-285359e50a02"),
    "name" : "Root 1",
    "children" : [ 
        LUUID("ca01f1ab-7c32-4e6b-a07a-e0ee9d8ec5ac"), 
        LUUID("6dd8c8ed-4a60-41ca-abf1-a4d795a0c213")
    ]
},
/* Child 1 - Root 1 */
{
    "_id" : LUUID("ca01f1ab-7c32-4e6b-a07a-e0ee9d8ec5ac"),
    "parentId" : LUUID("9f3a73df-bca7-48b7-b111-285359e50a02"),
    "reference" : {
        "type" : "someType",
        "id" : LUUID("331503FB-C4D1-4F7A-A461-933C701EF9AB")
    },
    "rootReferenceId" : LUUID("9f3a73df-bca7-48b7-b111-285359e50a02"),
    "name" : "Child 1 (Root 1)"
}
/* Child 2 - Root 1 */
{
    "_id" : LUUID("6dd8c8ed-4a60-41ca-abf1-a4d795a0c213"),
    "parentId" : LUUID("9f3a73df-bca7-48b7-b111-285359e50a02"),
    "reference" : {
        "type" : "someType",
        "id" : LUUID("23E8B540-3EFB-455A-AA5C-2B67D6B59943")
    },
    "rootReferenceId" : LUUID("9f3a73df-bca7-48b7-b111-285359e50a02"),
    "displayName" : "Child 2 (Root 1)"
}
/* Tree Root 2 */
{
    "_id" : LUUID("27f2b4a6-5471-406a-a39b-1e0b0f8c4eb9"),
    "name" : "Root 2",
    "children" : [ 
        LUUID("ad4ad076-322e-4c26-8855-91c9b1912d1f"), 
        LUUID("66452420-dd2f-4d27-91c9-78bd0990817c")
    ]
},
/* Child 1 - Root 2 */
{
    "_id" : LUUID("ad4ad076-322e-4c26-8855-91c9b1912d1f"),
    "parentId" : LUUID("27f2b4a6-5471-406a-a39b-1e0b0f8c4eb9"),
    "reference" : {
        "type" : "someType",
        "id" : LUUID("331503FB-C4D1-4F7A-A461-933C701EF9AB")
    },
    "rootReferenceId" : LUUID("27f2b4a6-5471-406a-a39b-1e0b0f8c4eb9"),
    "displayName" : "Child 1 (Root 2)"
}

That means in the end I want to have the document:

/* Child 2 - Root 1 */
{
    "_id" : LUUID("6dd8c8ed-4a60-41ca-abf1-a4d795a0c213"),
    "parentId" : LUUID("9f3a73df-bca7-48b7-b111-285359e50a02"),
    "reference" : {
        "type" : "someType",
        "id" : LUUID("23E8B540-3EFB-455A-AA5C-2B67D6B59943")
    },
    "rootReferenceId" : LUUID("9f3a73df-bca7-48b7-b111-285359e50a02"),
    "displayName" : "Child 2 (Root 1)"
}

Because its reference.id is contained in Root 1 but not in Root 2 (so it will not be excluded from the result set like Child 1)

I already wrote an aggregation stage to group the "reference.id"s like this:

db.getCollection('test').aggregate([
    {
        $match: {
            rootReferenceId: { $ne: null }
        }
    },
    {
        $group: {
            _id: "$rootReferenceId",
            referenceIds: { $addToSet: "$reference.id" } 
        }
    }
])

What returns me this:

/* 1 */
{
    "_id" : LUUID("27f2b4a6-5471-406a-a39b-1e0b0f8c4eb9"),
    "referenceIds" : [ 
        LUUID("331503fb-c4d1-4f7a-a461-933c701ef9ab")
    ]
}

/* 2 */
{
    "_id" : LUUID("9f3a73df-bca7-48b7-b111-285359e50a02"),
    "referenceIds" : [ 
        LUUID("23e8b540-3efb-455a-aa5c-2b67d6b59943"), 
        LUUID("331503fb-c4d1-4f7a-a461-933c701ef9ab")
    ]
}

Has anyone an idea how I can $project this into a format that $setDifference accepts?

I think it needs to look like this:

{
    LUUID("27f2b4a6-5471-406a-a39b-1e0b0f8c4eb9") : [ 
        LUUID("331503fb-c4d1-4f7a-a461-933c701ef9ab")
    ]
    LUUID("9f3a73df-bca7-48b7-b111-285359e50a02") : [ 
        LUUID("23e8b540-3efb-455a-aa5c-2b67d6b59943"), 
        LUUID("331503fb-c4d1-4f7a-a461-933c701ef9ab")
    ]
}

Or it there a complete different way to achieve this i am not aware of?

Any help is appreciated!

Edit Solution:

The solution is now like dnickless suggested. Really a nice one! Thanks a lot for this!


Solution

  • Here is what you could do without storing duplicate values in a string format. What's nice about this solution is that

    a) it returns the entire document that you are interested in so you don't need a second query (if you do not need the entire document then the $filter operator can simply be replaced with the $setDifference bit)

    b) it consists of very few and cheap stages (no grouping!) and will leverage indices on the rootReferenceId field (if there are any which I would recommend).

    db.getCollection('test').aggregate([
      { "$facet": {
        "allInRoot1": [{
          "$match": { "rootReferenceId": LUUID("9f3a73df-bca7-48b7-b111-285359e50a02") }
        }],
        "allInRoot2": [{
          "$match": { "rootReferenceId": LUUID("27f2b4a6-5471-406a-a39b-1e0b0f8c4eb9") }
        }]
      }}, {
        "$project": {
          "difference": {
            "$filter": {
                "input": "$allInRoot1",
                "as": "this",
                "cond": { "$in": [ "$$this.reference.id", { "$setDifference": [ "$allInRoot1.reference.id", "$allInRoot2.reference.id" ] } ] }
            }
          }
        }
      }
    ])