I added a step in my application to persist files via GridFS and added a metadata field called "processed" to work as a flag for a scheduled task that retrieves the new file and sends it on for processing. Since the Java driver for GridFS doesn't have a method allowing metadata to be updated I used MongoCollection for the "fs.files" collection to update "metadata.processing" to true.
I use GridFSBucket.find(eq("metadata.processed", false) to get the new files for processing and then update metadata.processed to true once processing is completed. This works if I add a new file while the application is running. However, if I have an existing file with "metadata.processed" set to false and start the application, the above find call returns no results. Similarly if I have a file that was already processed and I set the "metadata.processed" field back to false, the above find call also ceases working.
private static final String FILTER_STR = "'{'\"filename\" : \"{0}\"'}'";
private static final String UPDATE_STR =
"'{'\"$set\": '{'\"metadata.processed\": \"{0}\"'}}'";
@Autowired
private GridFSBucketFactory gridFSBucketFactory;
@Autowired
private MongoCollectionFactory mongoCollectionFactory;
public void storeFile(String filename, DateTime publishTime,
InputStream inputStream) {
if (fileExists(filename)) {
LOGGER.info("File named {} already exists.", filename);
} else {
uploadToGridFS(filename, publishTime, inputStream);
LOGGER.info("Stored file named {}.", filename);
}
}
public GridFSDownloadStream getFile(BsonValue id) {
return gridFSBucketFactory.getGridFSBucket().openDownloadStream(id);
}
public GridFSDownloadStream getFile(String filename) {
final GridFSFile file = getGridFSFile(filename);
return file == null ? null : getFile(file.getId());
}
public GridFSFindIterable getUnprocessedFiles() {
return gridFSBucketFactory.getGridFSBucket()
.find(eq("metadata.processed", false));
}
public void setProcessed(String filename, boolean isProcessed) {
final BasicDBObject filter =
BasicDBObject.parse(format(FILTER_STR, filename));
final BasicDBObject update =
BasicDBObject.parse(format(UPDATE_STR, isProcessed));
if (updateOne(filter, update)) {
LOGGER.info("Set metadata for {} to {}", filename, isProcessed);
}
}
private void uploadToGridFS(String filename, DateTime publishTime,
InputStream inputStream) {
gridFSBucketFactory.getGridFSBucket().uploadFromStream(filename,
inputStream, createMetadata(publishTime));
}
private GridFSUploadOptions createMetadata(DateTime publishTime) {
final Document metadata = new Document();
metadata.put("processed", false);
// metadata.put("publishTime", publishTime.toString());
return new GridFSUploadOptions().metadata(metadata);
}
private boolean fileExists(String filename) {
return getGridFSFile(filename) != null;
}
private GridFSFile getGridFSFile(String filename) {
return gridFSBucketFactory.getGridFSBucket()
.find(eq("filename", filename)).first();
}
private boolean updateOne(BasicDBObject filter, BasicDBObject update) {
try {
mongoCollectionFactory.getFsFilesCollection().updateOne(filter,
update, new UpdateOptions().upsert(true));
} catch (final MongoException e) {
LOGGER.error(
"The following failed to update, filter:{0} update:{1}",
filter, update, e);
return false;
}
return true;
}
Any idea what I can do to ensure:
GridFSBucket.find(eq("metadata.processed", false)
returns the proper results for existing files and/or files that have had the metadata changed?
The issue was due to setting the metadata.processed value as a String vs a boolean.
When initially creating the metadata I set its value with a boolean:
private GridFSUploadOptions createMetadata(DateTime publishTime) {
final Document metadata = new Document();
metadata.put("processed", false);
// metadata.put("publishTime", publishTime.toString());
return new GridFSUploadOptions().metadata(metadata);
}
And later I check for a boolean:
public GridFSFindIterable getUnprocessedFiles() {
return gridFSBucketFactory.getGridFSBucket()
.find(eq("metadata.processed", false));
}
But when updating the metadata using the "fs.files" MongoCollection I incorrectly added quotes around the boolean value here:
private static final String UPDATE_STR =
"'{'\"$set\": '{'\"metadata.processed\": \"{0}\"'}}'";
Which caused the metadata value to be saved as a String vs a boolean.