I have a stock dataframe which i have already loaded into mongodb. As new information comes in I am trying to append rows where each new row is a different date. However, when I try to insert my new data frame I receive this error : Error: E11000 duplicate key error collection: timeSeries.patimeseries index: id dup key: { _id: "5fb69a960438522a0631cca2" }
Here is what I am currently doing
con = mongo(collection, db, url)
#look at existing data
con$find()
date Price Value2
2020-10-10 50 AAPL
2020-10-10 88 MSFT
2020-10-11 52 AAPL
2020-10-11 54 MSFT
I receive new data on 2020-10-13 that I wish to insert.
print(new.df)
date price Value2
2020-10-12 56 AAPL
2020-10-12 92 MSFT
The first thing I do is remove the date "2020-10-13" from my db and then try and insert the new information.
con$remove(
query = '{"date" : "2020-10-12"}'
)
con$insert(
new.df
)
Error: E11000 duplicate key error collection: timeSeries.patimeseries index: _id_ dup key: { _id: "5fb69a960438522a0631cca2" }
Looks like you are removing the data but not the index on the data. Since the date data is indexed as an _id, it cannot be deleted. (See https://api.mongodb.com/wiki/current/Indexes.html#Indexes-The%5CidIndex) Instead of trying to remove the entry and create a new one in its place, have you tried using an update query?
con$update('{"date" : "2020-10-13"}', '{"$set":{"price": 56}}')
See https://jeroen.github.io/mongolite/manipulate-data.html#update-upsert for documentation.
Note: If you are running queries where some will need to be updated and others inserted, you can add upsert = TRUE
to perform an insert if no matching record is found.
Update: I do not see any documentation for executing a bulk update. One workaround, if it's not possible, would be to convert each data frame to a list of json strings like this:
library(rjson)
x <- split(df, 1:nrow(df))
json_strings <- lapply(x, toJSON)
In this example, each json would be composed of the column names as keys and a row of data as the corresponding values, so you could loop through and do something like this:
for (val in 1:length(df$date))
{
this_date <- df$date[val]
date_query <- paste('{"date" : ', as.character(this_date), '}')
set_query <- paste('{"$set":', json_strings[val], '}')
con$update(date_query, set_query)
}
There is probably a cleaner way to do all of this. You could refactor so there's no loop, for example, and use dplyr
verbs (such as mutate) to construct the json strings, but the idea is to generate json strings based on the data in the data frames, and execute update queries dynamically.