Search code examples
arraysmongodbaggregation-frameworkdistinct

Get all values of arrays from mongo documents


I have a mongo collection with documents containing arrays:

   { item: "journal", qty: 25, tags: ["blank", "red"], dim_cm: [ 14, 21 ] },
   { item: "notebook", qty: 50, tags: ["red", "blank"], dim_cm: [ 14, 21 ] },
   { item: "paper", qty: 100, tags: ["red", "blank", "plain"], dim_cm: [ 14, 21 ] },
   { item: "planner", qty: 75, tags: ["blank", "red"], dim_cm: [ 22.85, 30 ] },
   { item: "postcard", qty: 45, tags: ["blue"], dim_cm: [ 10, 15.25 ] }

I would like to get a single array containing all distinct values, such as:

tags: ["blank", "red", blue"] and dim_cm: [14,21,22.85,30,10,15.25]

Is this possible with an aggregation pipeline?


Solution

  • You can use $group with $reduce and $setIntersection:

    1. $group all documents to create one array of arrays per key
    2. flatten each array with $reduce and make it a set using $setIntersection.
    db.collection.aggregate([
      {$group: {_id: null, tags: {$push: "$tags"}, dim_cm: {$push: "$dim_cm"}}},
      {
        $project: {
          _id: 0,
          tags: {
            $setIntersection: [
              {$reduce: {
                  input: "$tags",
                  initialValue: [],
                  in: {$concatArrays: ["$$value", "$$this"]}
                }
              }
            ]
          },
          dim_cm: {
            $setIntersection: [
              {$reduce: {
                  input: "$dim_cm",
                  initialValue: [],
                  in: {$concatArrays: ["$$value", "$$this"]}
                }
              }
            ]
          }
        }
      }
    ])
    

    See how it works on the playground example

    Another way is:

    db.collection.aggregate([
      {$unwind: "$tags"},
      {
        $group: {
          _id: null,
          tags: {$addToSet: "$tags"},
          dim_cm: {$addToSet: "$dim_cm"
        }
      },
      {$unwind: "$dim_cm"},
      {$unwind: "$dim_cm"},
      {
        $group: {
          _id: null,
          tags: {$first: "$tags"},
          dim_cm: {$addToSet: "$dim_cm"}
        }
      }
    ])
    

    Playground - unwind

    Which you can split into two queries which will be much faster:

    db.collection.aggregate([
      {$unwind: "$tags"},
      {
        $group: {
          _id: null,
          tags: {$addToSet: "$tags"}
        }
      },
    ])
    

    A 3rd option is:

    db.collection.aggregate([
      {
        $project: {
          _id: 0,
          arr: {
            $concatArrays: [
              {$map: {input: "$tags", as: "item", in: {k: "tag",  v: "$$item"}}},
              {$map: {input: "$dim_cm", as: "item", in: {k: "dim_cm", v: "$$item"}}}
            ]
          }
        }
      },
      {$unwind: "$arr"},
      {
        $group: {
          _id: null,
          tags: {
            $addToSet: {$cond: [{$eq: ["$arr.k", "tag"]}, "$arr.v", "$$REMOVE"]}
          },
          dim_cm: {
            $addToSet: {$cond: [{$eq: ["$arr.k", "dim_cm"]}, "$arr.v", "$$REMOVE"]}
          }
        }
      }
    ])
    

    Playground 3rd