Given this structure:
{
"_id": ObjectId("56n606c39q49b80adfe6d17b")
"data": {
"field1": [
{
"subfield1.1": [
{
"val1.1.1": a
},
{
"val1.1.2": b
}
]
},
{
"subfield1.2": [
{
"val1.2.1": c
},
{
"val1.2.2": d
}
]
}
]
}
}
How can I write a query that counts the number of documents that have a maximum unique "val" considering all the "val"s in every "subfield" within a given field?
Facts to consider:
I am somewhat new to NoSQL. In normal SQL I'd approach this problem maybe with a self join, but here, even though it's possible to use Aggregation, I couldn't find a way to even get close to the real solution.
Given that a is the maximum value... This document SHOULD be counted:
{
"_id": ObjectId("56n606c39q49b80adfe6d17b")
"data": {
"field1": [
{
"subfield1.1": [
{
"val1.1.1": a
},
{
"val1.1.2": b
}
]
},
{
"subfield1.2": [
{
"val1.2.1": c
},
{
"val1.2.2": d
}
]
}
]
}
}
This document SHOULDN'T be counted:
{
"_id": ObjectId("56n606c39q49b80adfe6d17b")
"data": {
"field1": [
{
"subfield1.1": [
{
"val1.1.1": a
},
{
"val1.1.2": a
}
]
},
{
"subfield1.2": [
{
"val1.2.1": b
},
{
"val1.2.2": c
}
]
}
]
}
}
This document SHOULDN'T be counted either:
{
"_id": ObjectId("56n606c39q49b80adfe6d17b")
"data": {
"field1": [
{
"subfield1.1": [
{
"val1.1.1": a
},
{
"val1.1.2": b
}
]
},
{
"subfield1.2": [
{
"val1.2.1": a
},
{
"val1.2.2": c
}
]
}
]
}
}
This document SHOULD be counted (even if b is repeated):
{
"_id": ObjectId("56n606c39q49b80adfe6d17b")
"data": {
"field1": [
{
"subfield1.1": [
{
"val1.1.1": a
},
{
"val1.1.2": b
}
]
},
{
"subfield1.2": [
{
"val1.2.1": b
},
{
"val1.2.2": c
}
]
}
]
}
}
Any idea would be welcomed. Thank you!
Aggregation is absolutely what you need here. It may need some modification, but something like this might (hopefully) work for you:
db.collection.aggregate([
/*
Step 1: We need to unravel the multi-dimensional array first, otherwise we can't efficiently search for globally unique maximums.
*/
// Unravel the outer array first.
{$unwind: "$data.field1"},
// Simplifies the representation of the unwind result so that we have a flat field path rather than a nested one.
{$project: {
vals: "$data.field1.subfield1"
}},
// Now unravel the inner array.
{$unwind: "$vals"},
// Another simplification step for the same reason as the previous projection.
{$project: {
val: "$vals.val1"
}},
/*
Step 2: We want to create counts for array elements that are the same value from the same source document.
*/
// Creating the counts is easy--simply group by documents with the same source document ID and the same value, adding 1 to our total for each entry.
{$group: {
_id: {
_id: "$_id",
val: "$val"
},
count: {$sum: 1}
}},
/*
Step 3: Once we have our counts, can retrieve the maximum value for each source document.
*/
// First, sort by descending value so that the maximum value is the first we encounter.
{$sort: {
"_id.val": -1
}},
// With the entries in descending order, we can grab the first entry for each source document, which will give us all of the maximums.
{$group: {
_id: "$_id._id",
max: {
$first: {
val: "$_id.val",
count: "$count"
}
}
}},
// Just for kicks, let's simplify once more by unnesting the data.
{$project: {
val: "$max.val",
count: "$max.count"
}},
/*
Step 4: Now we just need to limit our results.
*/
// Any result with a count of 1 is a unique maximum.
{$match: {
count: 1
}}
])
This is admittedly a complicated query and difficult to assure functionality without knowing your actual document structure. That being said, there should be enough information in the comments to help you modify it to suit your needs. If you run into any problems, however, please let me know and I'll do what I can to help you nail down the issues.