I have a mongodb collection which has the following data (inserted using insert_many
)
[
{"attr_name": "a", "values": [
{"value": "1", "embedding": [1,2,3]},
{"value": "2", "embedding": [2,3,4]},
{"value": "3", "embedding": [3,4,5]},
]},
{"attr_name": "b", "values": [
{"value": "1", "embedding": [1,2,3]},
{"value": "4", "embedding": [4,5,6]},
]},
{"attr_name": "c", "values": [
{"value": "6", "embedding": [6,7,8]},
{"value": "7", "embedding": [7,8,9]},
]},
]
I want duplicates to be avoided on the attr_name
and value
. This is enforced by
collection.create_index(["attr_name", "value"], unique=True)
What I want is, when new data is inserted, if there's a matching index for attr_name
, it should append to the values
. But now, if there's a matching attr_name
, it omits the entire entry.
For example: I have this:
[
{"attr_name": "a", "values": [
{"value": "1", "embedding": [1,2,3]},
{"value": "2", "embedding": [2,3,4]},
]},
{"attr_name": "b", "values": [
{"value": "1", "embedding": [1,2,3]},
{"value": "4", "embedding": [4,5,6]},
]},
]
I'm inserting this:
[
{"attr_name": "a", "values": [
{"value": "1", "embedding": [1,2,3]},
{"value": "5", "embedding": [5,6,7]},
{"value": "6", "embedding": [6,7,8]},
]},
{"attr_name": "c", "values": [
{"value": "6", "embedding": [6,7,8]},
{"value": "7", "embedding": [7,8,9]},
]},
]
I want this to be the final state:
[
{"attr_name": "a", "values": [
{"value": "1", "embedding": [1,2,3]},
{"value": "2", "embedding": [2,3,4]}, # <---- appended
{"value": "5", "embedding": [5,6,7]},
{"value": "6", "embedding": [6,7,8]},
]},
{"attr_name": "b", "values": [
{"value": "1", "embedding": [1,2,3]},
{"value": "4", "embedding": [4,5,6]},
]},
{"attr_name": "c", "values": [
{"value": "6", "embedding": [6,7,8]},
{"value": "7", "embedding": [7,8,9]},
]},
]
I think you may need to issue individual update_one
commands with upsert=True
.
Perhaps something like:
my_updates = [
{"attr_name": "a", "values": [
{"value": "1", "embedding": [1,2,3]},
{"value": "5", "embedding": [5,6,7]},
{"value": "6", "embedding": [6,7,8]},
]},
{"attr_name": "c", "values": [
{"value": "6", "embedding": [6,7,8]},
{"value": "7", "embedding": [7,8,9]},
]},
]
for update in my_updates:
update_result = collection.update_one(
filter={"attr_name": update["attr_name"]},
update={"$addToSet": {"values": {"$each": update["values"]}}},
upsert=True
)
N.B.: You may want to inspect the values of update_result
properties.