Search code examples
rmongodbupsertmongolite

R mongolite Update collection with new dataframe by id (upsert)


I have a dataset stored in a MongoDB collection i.e. df_old.
Now I want to add new data i.e. df_new and I want to update the collection with them.
All data have a unique identifier, see column id.
If an id is already in the collection, then this value should not be inserted. Otherwise insert all other values.
Here is my example code which creates the existing and the new data and already inserts the existing data in the MongoDB database:

library(jsonlite)
library(tidyverse)
library(mongolite)

df_old <- tribble(
  ~id, ~colA, ~colB, ~colC, ~colD,
  1, 23, 45, "Value g", 8,
  2, 67, 56, "Value h", 3,
  3, 26, 12, "Value w", 7,
  4, 98, 85, "Value h", 9
)

df_new <- tribble(
  ~id, ~colA, ~colB, ~colC, ~colD,
  2, 67, 56, "Value h", 3,
  5, 85, 56, "Value s", 5,
  6, 23, 16, "Value t", 2
)

m = mongo(db = "DB_test",collection = "my_collection",url= "mongodb://localhost:27017")

m$insert(df_old)

Now I want to add the new values to the collection.
However, I am not sure how to specify the m$update() command.

m$update()

Can someone help me?
My desired output/collection should look like this:

# A tibble: 4 × 5
     id  colA  colB colC     colD
  <dbl> <dbl> <dbl> <chr>   <dbl>
1     1    23    45 Value g     8
2     2    67    56 Value h     3
3     3    26    12 Value w     7
4     4    98    85 Value h     9
5     5    85    56 Value s     5
6     6    23    16 Value t     2

The row with id=2 of the df_new is not inserted, because it already exists in the collection.


Solution

  • After a lot of trial and error, I found a solution that meets my requirements. It is not very pretty, but it works as I wanted it to.

    First you have to transform each row of the dataset into two JSON strings: one with only the id and one with all values. Then you pass the transposed list of the two vectors to the pmap function, which contains the upsert command.

    df_new %>% 
      rowwise() %>% 
      group_split() %>% 
      map(., ~ c(.x %>% pull(id) ,rjson::toJSON(.x))) %>% 
      transpose() %>% 
      pmap(., ~ m$update(paste('{"id" : ', ..1, '}'), paste('{"$set":', ..2, '}'), upsert = TRUE ))
    

    The result looks as desired:

    > m$find() %>% tibble()
    
    # A tibble: 8 × 5
         id  colA  colB colC     colD
      <dbl> <dbl> <dbl> <chr>   <dbl>
    1     1    23    45 Value g     8
    2     2    67    56 Value h     3
    3     3    26    12 Value w     7
    4     4    98    85 Value h     9
    5     5    85    56 Value s     5
    6     6    23    16 Value t     2