I'm trying to write a Java function that inserts a list of words into a collection. I want one document for each word with the unique field "word". The list of words I want to insert contains many duplicates so I want my function to only insert the document if there isn't already a document with the same "word"-value inside the collection. If there's already a document with the same "word"-value the function should not change or replace this document but go on inserting the next word from my list .
I created an index on the field "word" to avoid duplicate documents and catch the duplicate key Exception but I'm not sure if this is the right way to handle this issue.
IndexOptions uniqueWord = new IndexOptions().unique(true);
collection.createIndex(Indexes.ascending("word"), uniqueWord);
try {
File file = new File("src/words.txt");
Scanner scanner = new Scanner(file);
while (scanner.hasNextLine()) {
String word= scanner.next();
Document document = new Document();
document.put("word", word);
InsertManyOptions unordered= new InsertManyOptions();
ArrayList<Document> docs = new ArrayList<>();
docs.add(document);
try{
collection.insertMany(docs, unordered.ordered(false));
}catch(Exception e){
//System.out.println(e.getMessage());
}
You wrote:
If there's already a document with the same "word"-value the function should not change or replace this document but go on inserting the next word from my list .
This rules out the use of an atomic operation such as findOneAndUpdate
or findOneAndReplace
with upsert: true
.
Instead, I think your options are limited to a pre write check such as:
if (collection.count(Filters.eq("word", "..."))) {
// insert
} else {
// ignore because there is already a document for this word
}
This is subject to possible race conditions if your writer is multi threaded e.g. while one thread is reacting to a false result from collection.count()
another thread manages to write an entry for that word. The findOneAndReplace
is atomic so it is not prone to that issue,
I'd suggest that you should use findOneAndReplace
with FindOneAndReplaceOptions.upsert == true
, this will have the same eventual outcome as ignoring a document which has already been written (albeit by replacing it with an identical document) but it is perhaps safer than applying a pre-write-if-exists check.
Update your edited question implies that you are 'inserting many' but each time around the loop you only insert one document (despite using collection.insertMany()
) so the above suggestion is still valid. For example:
while (scanner.hasNextLine()) {
String word= scanner.next();
if (collection.count(Filters.eq("word", word)) == 0L) {
Document document = new Document();
document.put("word", word);
collection.insertOne(document);
}
}