Search code examples
mongodbperformancenosqlaggregation-frameworkquery-optimization

Query with arbitrary number of filter conditions that come from querying the same collection


const score_schema = mongoose.Schema(
  {
    tester_id: {
      type: mongoose.Schema.Types.ObjectId,
      ref: "user",
      required: true,
    },
    test_id: {
      type: mongoose.Schema.Types.ObjectId,
      ref: "test",
      required: true,
    },
    score: {
      type: Number,
      required: true,
    },
  },
  {
    collection: `score`,
    timestamps: true,
  }
);

query 1:

A user is given. First a query to the score model is made to find all tests that this user has taken. This will result in an arbitrary number of tests and the score of each test.

query 2:

Another query is made to the score model to find all documents where the test_id matches the test_id returned from the above query and where the score of each test is greater than or equal to the score of the test_id returned from the query above.

This essentially means that the arbitrary number of documents returned from query 1 will be the number of filter conditions to query 2.

Problem:

Can the above 2 queries be combined into 1 query and thus one round trip to the MongoDB API? If not, then a solution of 2 separate queries is also acceptable.

const dummy_data = [
  {
    tester_id: "1",
    test_id: "1",
    score: 40
  },
  {
    tester_id: "1",
    test_id: "2",
    score: 50
  },
   {
    tester_id: "1",
    test_id: "3",
    score: 70
  },
  {
    tester_id: "2",
    test_id: "1",
    score: 50
  },
  {
    tester_id: "3",
    test_id: "2",
    score: 20
  },
   {
    tester_id: "3",
    test_id: "3",
    score: 60
  },
  {
    tester_id: "7",
    test_id: "5",
    score: 40
  },
  {
    tester_id: "8",
    test_id: "4",
    score: 50
  },
   {
    tester_id: "9",
    test_id: "4",
    score: 70
  },
]

EDITED:

The output should have the same schema as the original collection, score_schema.

For example, given tester_id of 1, the output should be:

[
  {
    tester_id: "2",
    test_id: "1",
    score: 50
  }
]

Explanation:

The tester_id of 1 took a total of 3 tests. There is only one document where another tester_id took a test equaling to one of the three taken by tester_id of 1 and had a score greater than the score generated by tester_id of 1.

The output should not include documents with the original tester_id. In this case, it is tester_id 1.


Solution

  • Here's one way to do it. (Comments are in the query.)

    db.score.aggregate([
      { // the tester of interest
        "$match": { "tester_id": "1" }
      },
      {
        "$lookup": {
          // lookup by test_id
          "from": "score",
          "localField": "test_id",
          "foreignField": "test_id",
          "let": { "myId": "$tester_id", "myScore": "$score" },
          "pipeline": [
            { // only return docs of different tester
              // and higher score
              "$match": {
                "$expr": {
                  "$and": [
                    { "$ne": [ "$tester_id", "$$myId" ] },
                    { "$gt": [ "$score", "$$myScore" ] }
                  ]
                }
              }
            }
          ],
          "as": "higherScores"
        }
      },
      { // only keep non-empty higherScores
        "$match": {
          "$expr": { "$gt": [ { "$size": "$higherScores" }, 0 ] }
        }
      },
      { // only field we care about now
        "$project": { "higherScores": 1 }
      },
      { // might be more than one
        "$unwind": "$higherScores"
      },
      { // hoist it to ROOT
        "$replaceWith": "$higherScores"
      },
      { // don't want _id
        "$unset": "_id"
      }
    ])
    

    Try it on mongoplayground.net.