I am using express/nodejs(without mongoose) and mongodb as my database. I have a collection Pages which looks something like this
{
_id: ..
Urls: [
{
IncomingUrl: "/test/test1",
Status: "active",
},
{
IncomingUrl: "/test/test2",
Status: "active",
}
],
DraftUrls: [
// same structure as Urls
]
//other fields which arent related to the ques
}
Now while creating a page i am looking through this collection to find if any url in the urls array i am providing in the body of request is already existing in any existing page document.
And if there is a duplicate url then provide a list of duplicate urls in response.
Now the problem i am facing is that if i just had to get the count i could've used a find filter like
const filter = { $or: [
{ "Urls.IncomingUrl": { $in: urls } },
{ "DraftUrls.IncomingUrl": { $in: urls } }
] }
And then used a query like
db.collection(PageCollection).find(filter).countDocuments();
And it wouldve given the count of duplicate urls
But in my case i need to get the duplicate urls rather than the count so if i use something like this
const duplicateUrlPages = db.collection(PageCollection).find(filter).toArray();
And then run a nested for loop over the urls and duplicateUrlPages then it would be too costly.
Could someone please suggest how could i efficiently get just the list of urls among the input urls that are already existing in any Page document under its Urls.IncomingUrl or DraftUrls.IncomingUrl
Example:
Suppose there are 2 document like this in my DB
Document1: {
// ....
Urls: [
{ IncomingUrl: "test1", status: "active" },
// ...
],
DraftUrls: [
{ IncomingUrl: "test2", status: "inactive" },
// ...
]
}
Document2: {
// ....
Urls: [
{ IncomingUrl: "test4", status: "active" },
// ...
],
DraftUrls: [
{ IncomingUrl: "test10", status: "inactive" },
// ...
]
}
And I provide the body to a POST request controller function as
{
// ...
urls: ["test1", "test2", "test3", "test4"]
}
Then i want a response array like:
["test1", "test2", "test4"]
Since test1, test2 and test4 already exist
You can use an aggregation.
$match
: replicates your $or
conditions filter
object.$project
: reshape the documents by $filter
-ing the setting Urls
and DraftUrls
arrays to only contain matches from your $match
conditions. In essence this will get rid of objects such as those containing test10
in your sample documents.$project
: output the Urls
and DraftUrls
as a single array named urls
.$unwind
: the new urls
array into individual objects.$group
: these new objects and add the values to a single array using $addToSet
to ensure no duplicates.$project
: optional stage to get rid of the redundant _id
field.const urls = ["test1", "test2", "test3", "test4"];
const filter = { $or: [
{ "Urls.IncomingUrl": { $in: urls } },
{ "DraftUrls.IncomingUrl": { $in: urls } }
] }
db.collection(PageCollection).aggregate([
{
$match: filter
},
{
$project: {
"Urls": {
$map: {
input: {
$filter: {
input: "$Urls",
as: "u",
cond: {
$in: [
"$$u.IncomingUrl",
urls
]
}
}
},
as: "rls",
in: "$$rls.IncomingUrl"
}
},
"DraftUrls": {
$map: {
input: {
$filter: {
input: "$DraftUrls",
as: "du",
cond: {
$in: [
"$$du.IncomingUrl",
urls
]
}
}
},
as: "drls",
in: "$$drls.IncomingUrl"
}
}
}
},
{
$project: {
urls: {
$concatArrays: [
"$DraftUrls",
"$Urls"
]
},
_id: 0
}
},
{
$unwind: "$urls"
},
{
$group: {
_id: null,
urls: {
$addToSet: "$urls"
}
}
},
{
$project: {
_id: 0
}
}
])
See HERE for a working example.