Context: I have created a service that deals with users data. I can not modify source and order of data in upstream system (usually it is a partial snapshot of a user or a delta (e.g. user's new post). I also must deal with duplicated information. I modeled my persistent layer around document db (MongoDB) using a single 'users' collection because I deal with independent users that do not have any relation, I don't need to update multiple tables and my read model is similar to my document's schema. I can always identify a user by his/her _id and login so whenever I need to update user's info I can update just a single document. Moreover usually I don't want to override user's data, but rather append it so a document creates a user's log. I just update it in the very rare case (when the data is nullable but will be updated only once (details) or I don't care about intermediate state (details.verified).
Problem: Whenever I receive an info about a given user, after matching it I want to check if the user should be modified and update it in a single atomic query/update. The problem is I may have several fields or array entries to check.
In a given example I want to:
Another problem is that I can have e.g. just an information about a new post (that can also be a duplicate) for a given user, but it doesn't exist yet thus I need to upsert it.
So I want to cover few use cases that will update user's data in a single query, e.g.:
Updates based on a single criteria seems to be strait forward. Without aggregation I can update a document with only one match criteria. Aggregation pipeline can contain several stages, but I am not certain I can use push stage to append to an array if certain criteria is met without a group stage. Upsert with conditional push is also problematic. I end up by ignoring DuplicateKeyException (so far without aggregation, because my service's logic relay on information if we deal with a duplicate). Maybe you know a better solution.
https://www.mongodb.com/docs/manual/reference/operator/aggregation/push/
A question if this approach is doable? I want to avoid replacing a document. I would be appreciate for some guideline and/or example queries.
A simplified document:
[
{
"_id": 123,
"login": "john",
"details": {
"a": "abc",
"verified": true
},
"locations": [
{
"location": {
"country": "UK",
"region": "foo",
"city": "bar"
},
"createdAt": ISODate("2020-01-22T21:21:41.052Z")
}
],
"profiles": [
{
"preference": {
"a": "foo",
"b": "bar"
},
"details": {
"a": "foo",
"b": "bar"
},
"hash": "hash123",
"createdAt": ISODate("2020-01-22T21:21:41.052Z")
}
],
"posts": [
{
"postId": 234,
"message": "foo bar",
"createdAt": ISODate("2020-01-22T21:21:41.052Z")
},
{
"postId": 345,
"message": "bar foo",
"createdAt": ISODate("2020-01-22T21:21:41.052Z")
}
]
}
]
Your 3 updates of locations
, profiles
, and posts
shared the same update pattern and can be solved with the below update:
$let
$cond
; update or leave the current array as-is based on the result. If update is needed, $concatArrays
with current array and a single-sized array which holds your input variable.It may seem abstract so here is a concrete example of locations
{
"$set": {
"locations": {
"$let": {
"vars": {
// your location input here
"locationInput": {
"country": "TEST",
"region": "test",
"city": "test"
}
},
"in": {
"$cond": {
"if": {
$ne: [
"$$locationInput",
// last location
{
$last: "$locations.location"
}
]
},
"then": {
"$concatArrays": [
"$locations",
[
"$$locationInput"
]
]
},
"else": "$locations"
}
}
}
}
}
}
Then just repeat the $set
3 times for 3 cases. In the playground, they are separated for clarity/readability. You can put them into a single $set
if preferred.
EDIT: to handle the case in which the document does not exist and you want to insert it. You can $unionWith
the "template" document you want to ensure at least one document for that _id exists. Then use $first
to pick the first document. If the document already exists, it would be the first document and it would be picked. If it does not exist, the template document will be picked. Then you can continue with the update above. Finally, use $merge
to update to the collection.
{
"$match": {
"_id": 123
}
},
{
"$unionWith": {
"coll": "collection",
"pipeline": [
{
$documents: [
// put the "template" you want to insert here
{
"_id": 123,
"login": "john",
"details": {
"a": "abc",
"verified": true
},
"locations": [],
"profiles": [],
"posts": []
}
]
}
]
}
},
{
$group: {
_id: "$_id",
// if the user exists, it will be the first document picked
doc: {
$first: "$$ROOT"
}
}
},
{
"$replaceRoot": {
"newRoot": "$doc"
}
},
... // remaining updates above
{
"$merge": {
"into": "collection",
"on": "_id"
}
}