I'm relatively new to mongo and I have a collection that looks like this:
[
{
"stored": {
"box": [
{
"parcelId": "uwb1",
"status": "ACTIVE"
}
]
},
"checked": {
"box": [
{
"parcelId": "uwb1",
"status": "ACTIVE"
}
]
}
},
{
"stored": {
"box": [
{
"parcelId": "aqrf123",
"status": "PENDING"
}
]
},
"checked": {
"box": [
{
"parcelId": "aqrf123",
"status": "PENDING"
}
]
}
},
{
"checked": {
"box": [
{
"parcelId": "zuz873",
"status": "ACTIVE"
}
]
}
}
]
Some observations regarding the data:
checked
field but might not have the stored
fieldchecked
and stored
fields have the same schemabox
field and we can make the assumption that the box
field will always have 1 element in the array (only 1, not more, not less)What I'm trying to achieve is to get the documents sorted by that status
field, which is like an enum, and it can have 3 values - ACTIVE
, PENDING
and REJECTED
.
stored
field exist, I'll take it from there and disregard the checked
field. checked
field, which is guaranteed to exist, as previously mentioned.projection
to reduce the amount of data from the documents (which would probably make the whole operation faster).How I've tried to achieve this is by using an aggregation which looks like this:
db.getCollection('entries')
.aggregate([{
$addFields: {
sortStatus: {
$ifNull: [{
$let: {
vars: {
box: {
$arrayElemAt: [
"$stored.box", 0
]
}
},
in: "$$box.status"
}
}, {
$let: {
vars: {
box: {
$arrayElemAt: [
"$checked.box", 0
]
}
},
in: "$$box.status"
}
}]
}
}
},
{
$sort: {
sortStatus: 1
}
}
], {
allowDiskUse: true
})
Which seems to do the job but it feels quite slow. Also there's the allowDiskUse
which makes me uncomfortable a bit. If I leave it out, I get the Sort exceeded memory limit of x bytes, but did not opt in to external sorting. Aborting operation. Pass allowDiskUse:true to opt in
error message.
So my questions are:
allowDiskUse
option when doing an aggregation?.sort({"statusField": 1})
? This would be the last resort option, as I'd have to migrate the existing data.Your sortStatus
field value can be arrived at by using the following:
{ $addFields: { sortStatus: { $ifNull: [ "$stored.box.status", "$checked.box.status" ] } } },
Will this make the query faster? no, but the code is simpler.
(1) Are there faster alternatives, be it with or without aggregation?
I don't know, at this momemt.
(2) Are there any risks in using the allowDiskUse option when doing an aggregation?
Using the allowDiskUse:true
option means that the sort operation uses the disk for additional resources when the memory (RAM) for this operation exceeds its limit. The disk IO is very slow compared to the memory, so the "risk" is a much slower sort operation. This option becomes mandatory when the sort operation requires more memory than the limit of 100MB (see the documentation on Sort and Memory Restrictions in Aggregation).
(3) Would it be better (or is it the "mongo" way) to alter a bit the document structure and add that sortable field to the root of the document, add an index for it and just use .sort({"statusField": 1})? This would be the last resort option, as I'd have to migrate the existing data.
Creating a new status field and an index on this field means new considerations:
These can affect the write performance of your application.
But, the query will become a simple sort. With a large number of documents in the collection, the index used for sorting may or may not fit the memory during operation. You cannot be sure how this option can help without some real trials.
Here is some documentaion on Indexing Strategies.