Search code examples
node.jsmongodbmongodb-query

How to write an efficient mongo query for complex structures


I am using express/nodejs(without mongoose) and mongodb as my database. I have a collection Pages which looks something like this

{
  _id: ..
  Urls: [
    {
      IncomingUrl: "/test/test1",
      Status: "active",
    },
    {
      IncomingUrl: "/test/test2",
      Status: "active",
    }
  ],
  DraftUrls: [
    // same structure as Urls
  ] 
  //other fields which arent related to the ques
}

Now while creating a page i am looking through this collection to find if any url in the urls array i am providing in the body of request is already existing in any existing page document.

And if there is a duplicate url then provide a list of duplicate urls in response.

Now the problem i am facing is that if i just had to get the count i could've used a find filter like

const filter = { $or: [
  { "Urls.IncomingUrl": { $in: urls } },
  { "DraftUrls.IncomingUrl": { $in: urls } }
] }

And then used a query like

db.collection(PageCollection).find(filter).countDocuments();

And it wouldve given the count of duplicate urls

But in my case i need to get the duplicate urls rather than the count so if i use something like this

const duplicateUrlPages = db.collection(PageCollection).find(filter).toArray();

And then run a nested for loop over the urls and duplicateUrlPages then it would be too costly.

Could someone please suggest how could i efficiently get just the list of urls among the input urls that are already existing in any Page document under its Urls.IncomingUrl or DraftUrls.IncomingUrl

Example:

Suppose there are 2 document like this in my DB

Document1: {
  // ....
  Urls: [ 
    { IncomingUrl: "test1", status: "active" }, 
    // ... 
  ],
  DraftUrls: [
    { IncomingUrl: "test2", status: "inactive" },
    // ...
  ]
}

Document2: {
  // ....
  Urls: [ 
    { IncomingUrl: "test4", status: "active" }, 
    // ... 
  ],
  DraftUrls: [
    { IncomingUrl: "test10", status: "inactive" },
    // ...
  ]
}

And I provide the body to a POST request controller function as

{
  // ...
  urls: ["test1", "test2", "test3", "test4"]
}

Then i want a response array like:

["test1", "test2", "test4"]

Since test1, test2 and test4 already exist


Solution

  • You can use an aggregation.

    1. $match: replicates your $or conditions filter object.
    2. $project: reshape the documents by $filter-ing the setting Urls and DraftUrls arrays to only contain matches from your $match conditions. In essence this will get rid of objects such as those containing test10 in your sample documents.
    3. $project: output the Urls and DraftUrls as a single array named urls.
    4. $unwind: the new urls array into individual objects.
    5. $group: these new objects and add the values to a single array using $addToSet to ensure no duplicates.
    6. $project: optional stage to get rid of the redundant _id field.
    const urls = ["test1", "test2", "test3", "test4"];
    
    const filter = { $or: [
      { "Urls.IncomingUrl": { $in: urls } },
      { "DraftUrls.IncomingUrl": { $in: urls } }
    ] }
    
    db.collection(PageCollection).aggregate([
      {
        $match: filter
      },
      {
        $project: {
          "Urls": {
            $map: {
              input: {
                $filter: {
                  input: "$Urls",
                  as: "u",
                  cond: {
                    $in: [
                      "$$u.IncomingUrl",
                      urls
                    ]
                  }
                }
              },
              as: "rls",
              in: "$$rls.IncomingUrl"
            }
          },
          "DraftUrls": {
            $map: {
              input: {
                $filter: {
                  input: "$DraftUrls",
                  as: "du",
                  cond: {
                    $in: [
                      "$$du.IncomingUrl",
                      urls
                    ]
                  }
                }
              },
              as: "drls",
              in: "$$drls.IncomingUrl"
            }
          }
        }
      },
      {
        $project: {
          urls: {
            $concatArrays: [
              "$DraftUrls",
              "$Urls"
            ]
          },
          _id: 0
        }
      },
      {
        $unwind: "$urls"
      },
      {
        $group: {
          _id: null,
          urls: {
            $addToSet: "$urls"
          }
        }
      },
      {
        $project: {
          _id: 0
        }
      }
    ])
    

    See HERE for a working example.