I have a mongo collection with documents containing arrays:
{ item: "journal", qty: 25, tags: ["blank", "red"], dim_cm: [ 14, 21 ] },
{ item: "notebook", qty: 50, tags: ["red", "blank"], dim_cm: [ 14, 21 ] },
{ item: "paper", qty: 100, tags: ["red", "blank", "plain"], dim_cm: [ 14, 21 ] },
{ item: "planner", qty: 75, tags: ["blank", "red"], dim_cm: [ 22.85, 30 ] },
{ item: "postcard", qty: 45, tags: ["blue"], dim_cm: [ 10, 15.25 ] }
I would like to get a single array containing all distinct values, such as:
tags: ["blank", "red", blue"] and dim_cm: [14,21,22.85,30,10,15.25]
Is this possible with an aggregation pipeline?
You can use $group
with $reduce
and $setIntersection
:
$group
all documents to create one array of arrays per key$reduce
and make it a set using $setIntersection
.db.collection.aggregate([
{$group: {_id: null, tags: {$push: "$tags"}, dim_cm: {$push: "$dim_cm"}}},
{
$project: {
_id: 0,
tags: {
$setIntersection: [
{$reduce: {
input: "$tags",
initialValue: [],
in: {$concatArrays: ["$$value", "$$this"]}
}
}
]
},
dim_cm: {
$setIntersection: [
{$reduce: {
input: "$dim_cm",
initialValue: [],
in: {$concatArrays: ["$$value", "$$this"]}
}
}
]
}
}
}
])
See how it works on the playground example
Another way is:
db.collection.aggregate([
{$unwind: "$tags"},
{
$group: {
_id: null,
tags: {$addToSet: "$tags"},
dim_cm: {$addToSet: "$dim_cm"
}
},
{$unwind: "$dim_cm"},
{$unwind: "$dim_cm"},
{
$group: {
_id: null,
tags: {$first: "$tags"},
dim_cm: {$addToSet: "$dim_cm"}
}
}
])
Which you can split into two queries which will be much faster:
db.collection.aggregate([
{$unwind: "$tags"},
{
$group: {
_id: null,
tags: {$addToSet: "$tags"}
}
},
])
A 3rd option is:
db.collection.aggregate([
{
$project: {
_id: 0,
arr: {
$concatArrays: [
{$map: {input: "$tags", as: "item", in: {k: "tag", v: "$$item"}}},
{$map: {input: "$dim_cm", as: "item", in: {k: "dim_cm", v: "$$item"}}}
]
}
}
},
{$unwind: "$arr"},
{
$group: {
_id: null,
tags: {
$addToSet: {$cond: [{$eq: ["$arr.k", "tag"]}, "$arr.v", "$$REMOVE"]}
},
dim_cm: {
$addToSet: {$cond: [{$eq: ["$arr.k", "dim_cm"]}, "$arr.v", "$$REMOVE"]}
}
}
}
])