I have a dataset stored in a MongoDB collection i.e. df_old
.
Now I want to add new data i.e. df_new
and I want to update the collection with them.
All data have a unique identifier, see column id
.
If an id
is already in the collection, then this value should not be inserted. Otherwise insert all other values.
Here is my example code which creates the existing and the new data and already inserts the existing data in the MongoDB database:
library(jsonlite)
library(tidyverse)
library(mongolite)
df_old <- tribble(
~id, ~colA, ~colB, ~colC, ~colD,
1, 23, 45, "Value g", 8,
2, 67, 56, "Value h", 3,
3, 26, 12, "Value w", 7,
4, 98, 85, "Value h", 9
)
df_new <- tribble(
~id, ~colA, ~colB, ~colC, ~colD,
2, 67, 56, "Value h", 3,
5, 85, 56, "Value s", 5,
6, 23, 16, "Value t", 2
)
m = mongo(db = "DB_test",collection = "my_collection",url= "mongodb://localhost:27017")
m$insert(df_old)
Now I want to add the new values to the collection.
However, I am not sure how to specify the m$update()
command.
m$update()
Can someone help me?
My desired output/collection should look like this:
# A tibble: 4 × 5
id colA colB colC colD
<dbl> <dbl> <dbl> <chr> <dbl>
1 1 23 45 Value g 8
2 2 67 56 Value h 3
3 3 26 12 Value w 7
4 4 98 85 Value h 9
5 5 85 56 Value s 5
6 6 23 16 Value t 2
The row with id=2
of the df_new
is not inserted, because it already exists in the collection.
After a lot of trial and error, I found a solution that meets my requirements. It is not very pretty, but it works as I wanted it to.
First you have to transform each row of the dataset into two JSON strings: one with only the id
and one with all values.
Then you pass the transposed list of the two vectors to the pmap
function, which contains the upsert
command.
df_new %>%
rowwise() %>%
group_split() %>%
map(., ~ c(.x %>% pull(id) ,rjson::toJSON(.x))) %>%
transpose() %>%
pmap(., ~ m$update(paste('{"id" : ', ..1, '}'), paste('{"$set":', ..2, '}'), upsert = TRUE ))
The result looks as desired:
> m$find() %>% tibble()
# A tibble: 8 × 5
id colA colB colC colD
<dbl> <dbl> <dbl> <chr> <dbl>
1 1 23 45 Value g 8
2 2 67 56 Value h 3
3 3 26 12 Value w 7
4 4 98 85 Value h 9
5 5 85 56 Value s 5
6 6 23 16 Value t 2