Search code examples
mongodbmongodb-querynosqlaggregation-frameworknosql-aggregation

lookup on a specific array element efficiently


Database Schema

admins

{
  "timestamp": "2023-03-08T12:09:19.662015",
  "name": "ahmad",
  "surname": "salman",
  "email": "a.s@ug.bilkent.edu.tr",
  "pendingReviews": [
    {
      "_id": "e5b66416-25a1-41d7-a7f3-e2bb65f0e6ae",
      "timestamp": "2023-03-13T19:00:50.181141",
      "postID": "95c5f3f6-709a-4da8-b8e8-368c9026da47",
      "reason": "Inaccurate",
      "description": "this is inaccurate",
      "userID": "03ca4e23-bea7-45e2-a94a-aee82aab5c09"
    },
    {
      "_id": "45asr2t3-hs82-41d7-a7f3-e2bb65f0e6ae",
      "timestamp": "2023-03-13T19:00:50.181141",
      "postID": "95c5f3f6-709a-4da8-b8e8-368c9026da47",
      "reason": "Inaccurate",
      "description": "this is inaccurate",
      "userID": "03ca4e23-bea7-45e2-a94a-aee82aab5c09"
    },
  ],
  "acceptedReviews": [
    {
      "_id": "22s4d9w2-hs82-41d7-a7f3-e2bb65f0e6ae",
      "timestamp": "2023-03-13T19:00:50.181141",
      "postID": "95c5f3f6-709a-4da8-b8e8-368c9026da47",
      "reason": "Inaccurate",
      "description": "this is inaccurate",
      "userID": "03ca4e23-bea7-45e2-a94a-aee82aab5c09"
    },
  ],
  "rejectedReviews": [
    {
      "_id": "mslw83hs8-hs82-41d7-a7f3-e2bb65f0e6ae",
      "timestamp": "2023-03-13T19:00:50.181141",
      "postID": "95c5f3f6-709a-4da8-b8e8-368c9026da47",
      "reason": "Inaccurate",
      "description": "this is inaccurate",
      "userID": "03ca4e23-bea7-45e2-a94a-aee82aab5c09"
    }
  ],
  "numberOfPendingRequests": 2,
}

reports

{
  "_id": "e5b66416-25a1-41d7-a7f3-e2bb65f0e6ae"
  "user": {
    "email": "a.s2@ug.bilkent.edu.tr"
  },
  "postings": [
    {
      "id": "95c5f3f6-709a-4da8-b8e8-368c9026da47",
      "isDeleted": false,
      "timestamp": "2023-02-13",
      "houseType": "apartment",
      "totalNumOfRoommates": 5,
      "location": {
        "neighborhood": "Oran",
        "district": "Çankaya",
        "city": "Adana"
      },
      "startDate": "2022-11-10",
      "endDate": "2022-11-15",
      "postingType": "House Seeker",
      "startPrice": 4,
      "endPrice": 10
    }
  ]
}

Each document in the reports collection holds only a single element in the postings array

Assuming I am trying to return the document with ID of 'e5b66416-25a1-41d7-a7f3-e2bb65f0e6ae' in pendingReviews array

Desired Result

"_id": "e5b66416-25a1-41d7-a7f3-e2bb65f0e6ae"
"postings": [
    {
      "id": "95c5f3f6-709a-4da8-b8e8-368c9026da47",
      "isDeleted": false,
      "timestamp": "2023-02-13",
      "houseType": "apartment",
      "totalNumOfRoommates": 5,
      "location": {
        "neighborhood": "Oran",
        "district": "Çankaya",
        "city": "Adana"
      },
      "startDate": "2022-11-10",
      "endDate": "2022-11-15",
      "postingType": "House Seeker",
      "startPrice": 4,
      "endPrice": 10
    }
  ]
"pendingReviews": [
    {
      "_id": "e5b66416-25a1-41d7-a7f3-e2bb65f0e6ae",
      "timestamp": "2023-03-13T19:00:50.181141",
      "postID": "95c5f3f6-709a-4da8-b8e8-368c9026da47",
      "reason": "Inaccurate",
      "description": "this is inaccurate",
      "userID": "03ca4e23-bea7-45e2-a94a-aee82aab5c09"
    }

So the '_id' and the 'pendingReviews._id' are the same which what I want to lookup with.

Current Approach

[
  {
    $lookup:
      {
        from: "reports",
        localField: "pendingReviews._id",
        foreignField: "_id",
        as: "result",
      },
  },
]

Current Result

{
  "_id": "f757596f-582d-4527-9ed6-ed37df2f54e4",
  "timestamp": "2023-03-08T12:09:19.662015",
  "name": "ahmad",
  "surname": "salman",
  "email": "a.s@ug.bilkent.edu.tr",
  "pendingReviews": [
    {
      "_id": "e5b66416-25a1-41d7-a7f3-e2bb65f0e6ae",
      "timestamp": "2023-03-13T19:00:50.181141",
      "postID": "95c5f3f6-709a-4da8-b8e8-368c9026da47",
      "reason": "Inaccurate Post Information",
      "description": "this is inaccurate",
      "userID": "03ca4e23-bea7-45e2-a94a-aee82aab5c09"
    }
  ],
  "acceptedReviews": [
    "accepted review 1",
    "accepted review 2"
  ],
  "rejectedReviews": [
    "rejected review 1",
    "rejected review 2"
  ],
  "numberOfPendingRequests": 2,
  "password": "$2b$12$l6iBlLtSgCaMsp5SlRaRjeE7ZFCSPsVVR/VG67NB6/XtAph8NyME.",
  "result": [
    {
      "_id": "e5b66416-25a1-41d7-a7f3-e2bb65f0e6ae",
      "user": {
        "email": "a.s2@ug.bilkent.edu.tr"
      },
      "postings": [
        {
          "id": "95c5f3f6-709a-4da8-b8e8-368c9026da47",
          "isDeleted": false,
          "timestamp": "2023-02-13",
          "houseType": "apartment",
          "totalNumOfRoommates": 5,
          "location": {
            "neighborhood": "Oran",
            "district": "Çankaya",
            "city": "Adana"
          },
          "startDate": "2022-11-10",
          "endDate": "2022-11-15",
          "postingType": "House Seeker",
          "startPrice": 4,
          "endPrice": 10
        }
      ]
    }
  ]
}
{
  "_id": {
    "$oid": "640f3b531d59d62b263b7dd0"
  },
  "timestamp": "2023-03-08T12:09:19.662015",
  "name": "ahmad",
  "surname": "salman",
  "email": "a.s2@ug.bilkent.edu.tr",
  "pendingReviews": [],
  "acceptedReviews": [],
  "rejectedReviews": [],
  "numberOfPendingRequests": 10,
  "password": "$2b$12$l6iBlLtSgCaMsp5SlRaRjeE7ZFCSPsVVR/VG67NB6/XtAph8NyME.",
  "result": []
}
{
  "_id": {
    "$oid": "640f41301d59d62b263b7dd2"
  },
  "timestamp": "2023-03-08T12:09:19.662015",
  "name": "ahmad",
  "surname": "salman",
  "email": "a.s3@ug.bilkent.edu.tr",
  "pendingReviews": [],
  "acceptedReviews": [],
  "rejectedReviews": [],
  "numberOfPendingRequests": 5,
  "password": "$2b$12$l6iBlLtSgCaMsp5SlRaRjeE7ZFCSPsVVR/VG67NB6/XtAph8NyME.",
  "result": []
}

Some users have only placeholder texts, but ultimately what I am seeking to achieve here is that I want the array element in the admins collection, in the pendingReviews array specified by the id 'e5b66416-25a1-41d7-a7f3-e2bb65f0e6ae' to be joined with the document with the same '_id' in reports collection without joining on everything for the entire DB.

I have checked this question earlier but it did not really help me: $lookup on ObjectId's in an array

Thank you in advance


Solution

  • This is one way of doing it.

    db.admins.aggregate([
      {
        "$unwind": "$pendingReviews"
      },
      {
        "$addFields": {
          "copy": "$pendingReviews"
        }
      },
      {
        "$lookup": {
          "from": "reports",
          "localField": "pendingReviews._id",
          "foreignField": "_id",
          "as": "joinData"
        }
      },
      {
        "$match": {
          $expr: {
            "$gt": [
              {
                "$size": "$joinData"
              },
              0
            ]
          }
        }
      },
      {
        "$unwind": "$joinData"
      },
      {
        "$addFields": {
          "joinData.pendingReviews": [
            "$copy"
          ]
        }
      },
      {
        "$replaceRoot": {
          "newRoot": "$joinData"
        }
      }
    ])
    

    First, we unwind the pendingReviews array. We store a copy of pendingReview in copy key. We use $lookup and join admins collection with reports collection, and store it in key joinData. Now, we filter docs where joinData is empty. Finally, we unwind joinData, append copy within joinData, and bring joinData object to the root.

    Playground link.