Search code examples
mongodbpysparkmongodb-querypymongo

Sum of values in Nested Json Array objects via MongoDB Query


Wanted to find the sum of Nested json array via mongodb query:

$group is asking for _id for grouping each obj in array, which is not working.

The documents is like this :

{
 point : "3",
 created_date : "2022-06-19",
 "name" : "grpA",
 "value_format" : [
 {
    "data" : {
        "A" : 10,
        "B" : 20,
        "C" : 30,
        "D" : 40,
     },
     "key" : "key1"
  },
     "data" : {
        "A" : 50,
        "B" : 60,
        "C" : 70,
        "D" : 80,
     },
     "key" : "key2"
  }
]
}


Expected output : 

{
  "key" : "anything",
  "val" : {
             "sumA": 60,
             "sumB": 80,
             "sumC": 100,
             "sumD": 120,
          }
}

Solution

  • Query1

    • assuming that A,B,C,D is part of your schema and not unknown data
    • and assuming that you want the sum for each document you can do this

    Playmongo

    aggregate(
    [{"$set": 
       {"sumA": {"$sum": "$value_format.data.A"},
        "sumB": {"$sum": "$value_format.data.B"},
        "sumC": {"$sum": "$value_format.data.C"},
        "sumD": {"$sum": "$value_format.data.D"}}}])
    

    Query2

    • assuming dynamic fields (data on fields(unknown schema) => complicated and slow query)
    • and want to sum all collection not just in 1 doc
    • unwind the value_format
    • make all pair documents A:10 to {"k" : A, "v" 10}
    • we need this to group by "k" (this is the problem with dynamic keys)
    • unwind data
    • group by this k
    • and then back again to have the sumA : ... arrayToObject
    • group by nil, and collect all in 1 document

    *most likely you need query2, the complexity comes from the data on schema, and this is kinda fast way, it could be worse

    Playmongo

    aggregate(
    [{"$unwind": "$value_format"},
     {"$project": {"data": {"$objectToArray": "$value_format.data"}}},
     {"$unwind": "$data"},
     {"$group": {"_id": "$data.k", "sum": {"$sum": "$data.v"}}},
     {"$replaceRoot": 
       {"newRoot": 
         {"$arrayToObject": 
           [[{"k": {"$concat": ["sum", "$_id"]}, "v": "$sum"}]]}}},
     {"$group": {"_id": null, "val": {"$mergeObjects": "$$ROOT"}}}])
    

    Query3

    • this sums with unknown fields(like query2) but finds the sums for each document
    • uses one extra collection with 1 empty document one_doc_collection=[{}]
    • this is like a trick, when we want to use stage operators to an array, and we dont want to unwind all documents (we use the lookup and unwind locally)
    • query is similar to query 2, just do those inside the lookup its big because data on fields, both in the original data and in the output

    Playmongo

    aggregate(
    [{"$project": 
       {"value_format": 
         {"$map": 
           {"input": "$value_format",
            "in": {"$objectToArray": "$$this.data"}}}}},
     {"$set": 
       {"value_format": 
         {"$reduce": 
           {"input": "$value_format",
            "initialValue": [],
            "in": {"$concatArrays": ["$$value", "$$this"]}}}}},
     {"$lookup": 
       {"from": "one_doc_collection",
        "pipeline": 
         [{"$set": {"value_format": "$$value_format"}},
           {"$unwind": "$value_format"},
           {"$group": 
             {"_id": "$value_format.k", "sum": {"$sum": "$value_format.v"}}},
           {"$replaceRoot": 
             {"newRoot": 
               {"$arrayToObject": 
                 [[{"k": {"$concat": ["sum", "$_id"]}, "v": "$sum"}]]}}}],
        "as": "sums",
        "let": {"value_format": "$value_format"}}},
     {"$project": 
       {"sums": 
         {"$reduce": 
           {"input": "$sums",
            "initialValue": {},
            "in": {"$mergeObjects": ["$$value", "$$this"]}}}}}])