This involves 3 collections:
Example of a document in posts:
{
"_id" : ObjectId("57fbf3ce7ccbc906ed87cef6"),
"__v" : 6,
"author" : ObjectId("57fbe2ac3cfb9e061df86ebb"),
"postSubCategories" : [
ObjectId("5806344baa0bbf284a2316e4")//reference to document in postsubcategories collection
],
"postSuperCategories" : [
ObjectId("580679958a5f5f448ba5aae9"),
ObjectId("580679958a5f5f448ba5aaf2")//references to documents in postsupercategories collection
],
"publishedDate" : ISODate("2016-10-10T04:00:00.000Z"),
"state" : "published",
"templateName" : ObjectId("57fbf3977ccbc906ed87cef3"),
"title" : "My title",
"topics" : []}
My query is
db.posts.aggregate([
{'$unwind':
{'path':"$postSubCategories"}
},
{'$lookup': {
'from':"postsubcategories",
'localField': "postSubCategories",
'foreignField': "_id",
'as': "subObject"
}},
{'$unwind':
{'path':"$postSuperCategories"}
},
{'$lookup': {
'from':"postsupercategories",
'localField': "postSuperCategories",
'foreignField': "_id",
'as': "superObject"
}},
{'$match': {
'$or':
[{ "subObject.searchKeywords": "home monitor" },
{ "superObject.searchKeywords": "home monitor" }]
}
},
{'$match': {
"state": "published"
}}
Both postsubcategories and postsupercategories collections contain a field called searchKeywords that is an array of text within their documents. I want to be able to query those searchKeywords fields and return the posts documents that match. I need a deduped set of returned _ids.
The query is returning four documents. Example:
ObjectId("57fbf3ce7ccbc906ed87cef6")
ObjectId("57fbf3ce7ccbc906ed87cef6")
ObjectId("57fbf40b7ccbc906ed87cef7")
ObjectId("57fbf40b7ccbc906ed87cef7")
I understand why it is returning 4. One document contains postSubCategories object 5806344baa0bbf284a2316e4
and postSuperCategories id 580679958a5f5f448ba5aae9
.
The second document contains postSubCategories object 5806344baa0bbf284a2316e4
and postSuperCategories 580679958a5f5f448ba5aaf2
. This repeats for the second post
ObjectId("57fbf3ce7ccbc906ed87cef6")
ObjectId("57fbf40b7ccbc906ed87cef7")
I know that technically the 2 matching _ids from the list of 4 are not the exact same because they each contain a different postSuperCategories object, but at this point I don't care about that field anymore and just need a singular posts document because I need access to the other fields.
Any help would be greatly appreciated. I have tried looking into $group
, $addToSet
, and $setUnion
and have been unsuccessful so far.
You can add one $group
retrieving distinct _id
with the first value found for each of the properties you want to extract per _id
.
for the $group
aggregation :
{
'$group': {
_id: '$_id',
item: { $first: "$$ROOT" }
}
}
which will give you the first item for root document in item
field :
{ "_id" : ObjectId("57fbf40b7ccbc906ed87cef7"), "items" : { "_id" : ObjectId("57fbf40b7ccbc906ed87cef7"), "__v" : 6, "author" : ObjectId("57fbe2ac3cfb9e061df86ebb"), "postSubCategories" : ObjectId("5806344baa0bbf284a2316e4"), "postSuperCategories" : ObjectId("580679958a5f5f448ba5aae9"), "publishedDate" : ISODate("2016-12-10T04:00:00Z"), "state" : "published", "templateName" : ObjectId("57fbf3977ccbc906ed87cef4"), "title" : "My title2", "topics" : [ "a", "b" ], "subObject" : [ { "_id" : ObjectId("5806344baa0bbf284a2316e4"), "searchKeywords" : "home monitor" } ], "superObject" : [ { "_id" : ObjectId("580679958a5f5f448ba5aae9"), "searchKeywords" : "home monitor2" } ] } }
{ "_id" : ObjectId("57fbf3ce7ccbc906ed87cef6"), "items" : { "_id" : ObjectId("57fbf3ce7ccbc906ed87cef6"), "__v" : 6, "author" : ObjectId("57fbe2ac3cfb9e061df86ebb"), "postSubCategories" : ObjectId("5806344baa0bbf284a2316e4"), "postSuperCategories" : ObjectId("580679958a5f5f448ba5aae9"), "publishedDate" : ISODate("2016-10-10T04:00:00Z"), "state" : "published", "templateName" : ObjectId("57fbf3977ccbc906ed87cef3"), "title" : "My title", "topics" : [ ], "subObject" : [ { "_id" : ObjectId("5806344baa0bbf284a2316e4"), "searchKeywords" : "home monitor" } ], "superObject" : [ { "_id" : ObjectId("580679958a5f5f448ba5aae9"), "searchKeywords" : "home monitor2" } ] } }
Otherwise, for selecting a field in the response :
{
'$group': {
_id: '$_id',
author: {
$first: '$author'
},
publishedDate: {
$first: '$publishedDate'
},
state: {
$first: '$state'
},
templateName: {
$first: '$templateName'
},
title: {
$first: '$title'
},
topics: {
$first: '$topics'
}
}
}
You will get something like :
{ "_id" : ObjectId("57fbf40b7ccbc906ed87cef7"), "author" : ObjectId("57fbe2ac3cfb9e061df86ebb"), "publishedDate" : ISODate("2016-10-10T04:00:00Z"), "state" : "published", "templateName" : ObjectId("57fbf3977ccbc906ed87cef3"), "title" : "My title", "topics" : [ ] }
{ "_id" : ObjectId("57fbf3ce7ccbc906ed87cef6"), "author" : ObjectId("57fbe2ac3cfb9e061df86ebb"), "publishedDate" : ISODate("2016-10-10T04:00:00Z"), "state" : "published", "templateName" : ObjectId("57fbf3977ccbc906ed87cef3"), "title" : "My title", "topics" : [ ] }