Search code examples
mongodbaggregation-frameworknosql-aggregation

MongoDB Aggregation pipeline: group array elements by a field without knowing the field value


I have a collection of football matches shaped this way:

{
  _id: ObjectId("SomeUniqueMatchID"),
  players: [
    { team: "Arsenal", name: "Saka", distanceRan: 8590},
    { team: "Arsenal", name: "Aubameyang", distanceRan: 9230}
    // ...remaining players for both teams
  ],
  timestamp: 129380193,
  // other match info
}

I want to query for the average distance ran by Aubameyang and Saka when playing together, regardless of the team they are in, as long as they are playing in the same.

I understand as far $unwind on the players field, then $match and finally using $group to calculate the average, but I have no clue what to write on the $match condition when I don't know the team before running the query.


Solution

  • Query1

    • unwind players
    • match to keep only the 2 players names
    • group by match_id team and push the distances in an array
    • match and keep only arrays of size =2 (this is the key, size=2 only if those players played both in the same match and in the same team)
      (if you see on the example code id=2,id=3 are ignored because those players never played both in the same match and same team)
    • uwnind distances (here we know that all those are valid ones)
    • group by null(all collection 1 group) and average the distance (you don't care about what match, what team or which player, so only information about the avgDistance is kept)

    Test code here

    aggregate(
    [{"$unwind": {"path": "$players"}},
      {"$match": 
        {"$expr": 
          {"$or": 
            [{"$eq": ["$players.name", "Saka"]},
              {"$eq": ["$players.name", "Aubameyang"]}]}}},
      {"$group": 
        {"_id": {"_id": "$_id", "team": "$players.team"},
          "distances": {"$push": "$players.distanceRan"}}},
      {"$match": {"$expr": {"$eq": [{"$size": "$distances"}, 2]}}},
      {"$unwind": {"path": "$distances"}},
      {"$group": {"_id": null, "avgDistanceRan": {"$avg": "$distances"}}},
      {"$unset": ["_id"]}])
    

    Query2

    • local it uses reduce, and the fact that 1 player in 1 match, max 1 time and max in 1 team (query1 works even if this wasn't true)
    • reduce to keep those 2 distances, we add on distance array only if team is the same, and the name is one of the two
    • again keep only distance count=2 (to make sure both players in the same match on the same team)
    • unwind and group by null,average like above

    Test code here

    aggregate(
    [{"$set": 
        {"players": 
          {"$reduce": 
            {"input": "$players",
              "initialValue": {"team": null, "distances": []},
              "in": 
              {"$cond": 
                [{"$and": 
                    [{"$or": 
                        [{"$eq": ["$$this.name", "Saka"]},
                          {"$eq": ["$$this.name", "Aubameyang"]}]},
                      {"$or": 
                        [{"$eq": ["$$value.team", null]},
                          {"$eq": ["$$value.team", "$$this.team"]}]}]},
                  {"team": "$$this.team",
                    "distances": 
                    {"$concatArrays": 
                      ["$$value.distances", ["$$this.distanceRan"]]}},
                  "$$value"]}}}}},
      {"$match": {"$expr": {"$eq": [{"$size": "$players.distances"}, 2]}}},
      {"$unwind": {"path": "$players.distances"}},
      {"$group": 
        {"_id": null, "avgdistanceRan": {"$avg": "$players.distances"}}},
      {"$unset": ["_id"]}])