Search code examples
mongodbmapreduceaggregation-frameworkdistinct-values

MongoDB distinct values on subdocuments


I have a little weird database structure it is as follows:

I have a document with normal properties, then I have a metadata property which is an array of objects.

metadata: {[
  {
     key: [key],
     value: [value]
  },
  ...
 ]}

Edit: There will never be a metadata sub-document which has a duplicate key

It was done this way to retain the order of the metadata objects

Now I want to get distinct values of a metadata object with a given key.

I want to find every distinct [value] where [key] = "x" using MongoDB. And have the distinct values returned in an array (not the document)

I guess this is not possible using the distinct command, but is this possible using an aggregation pipeline or do I have to use Map-Reduce?

Any suggestions?

Thanks in advance! :)


Solution

  • I presume you mean this:

    {
        "metadata": [
            { "key": "abc", "value": "borf" },
            { "key": "cdc", "value": "biff" }
        ]
    },
    {
        "metadata": [
            { "key": "bbc", "value": "barf" },
            { "key": "abc", "value": "borf" },
            { "key": "abc", "value": "barf" }
        ]
    }
    

    Where if you filter for "abc" and get the distinct "value" entries like this:

    db.collection.aggregate([
        { "$match": { "metadata.key": "abc" } },
        { "$unwind": "$metadata" },
        { "$match": { "metadata.key": "abc" } },
        { "$group": {
            "_id": "$metadata.value"
        }}
    ])
    

    Or even better:

    db.collection.aggregate([
        { "$match": { "metadata.key": "abc" } },
        { "$redact": {
            "$cond": {
                "if": { "$eq": [ { "$ifNull": [ "$key", "abc" ] }, "abc" ] },
                "then": "$$DESCEND",
                "else": "$$PRUNE"
            }
        }},
        { "$unwind": "$metadata" },
        { "$group": {
            "_id": "$metadata.value",
            "count": { "$sum": 1 }
        }}
    ])
    

    Which would basically give:

    { "_id": "barf", "count": 1 },
    { "_id": "borf", "count": 2 }
    

    But it is not possible for this to just be an array of "barf" and "borf". The distinct() method does an array of keys only, but it is also very limited. Therefore it can only do this:

    db.collection.distinct("metadata.value",{ "metadata.key": "abc" })
    [ "biff", "borf", "barf" ]
    

    Which is incorrect as a result. So just take the "document" results from above and apply some "post processing":

    db.collection.aggregate([
        { "$match": { "metadata.key": "abc" } },
        { "$redact": {
            "$cond": {
                "if": { "$eq": [ { "$ifNull": [ "$key", "abc" ] }, "abc" ] },
                "then": "$$DESCEND",
                "else": "$$PRUNE"
            }
        }},
        { "$unwind": "$metadata" },
        { "$group": {
            "_id": "$metadata.value"
        }}
    ]).map(function(doc) {
        return doc._id;
    })
    

    And that result is a plain array of just the distinct values:

    [ "borf", "barf" ]