Search code examples
javagridfsmongo-collection

Find not working for GridFS after updating metadata


I added a step in my application to persist files via GridFS and added a metadata field called "processed" to work as a flag for a scheduled task that retrieves the new file and sends it on for processing. Since the Java driver for GridFS doesn't have a method allowing metadata to be updated I used MongoCollection for the "fs.files" collection to update "metadata.processing" to true.

I use GridFSBucket.find(eq("metadata.processed", false) to get the new files for processing and then update metadata.processed to true once processing is completed. This works if I add a new file while the application is running. However, if I have an existing file with "metadata.processed" set to false and start the application, the above find call returns no results. Similarly if I have a file that was already processed and I set the "metadata.processed" field back to false, the above find call also ceases working.

private static final String FILTER_STR = "'{'\"filename\" : \"{0}\"'}'";

private static final String UPDATE_STR =
        "'{'\"$set\": '{'\"metadata.processed\": \"{0}\"'}}'";

@Autowired
private GridFSBucketFactory gridFSBucketFactory;

@Autowired
private MongoCollectionFactory mongoCollectionFactory;

public void storeFile(String filename, DateTime publishTime,
        InputStream inputStream) {

    if (fileExists(filename)) {
        LOGGER.info("File named {} already exists.", filename);
    } else {
        uploadToGridFS(filename, publishTime, inputStream);
        LOGGER.info("Stored file named {}.", filename);
    }
}

public GridFSDownloadStream getFile(BsonValue id) {
    return gridFSBucketFactory.getGridFSBucket().openDownloadStream(id);
}

public GridFSDownloadStream getFile(String filename) {
    final GridFSFile file = getGridFSFile(filename);
    return file == null ? null : getFile(file.getId());
}

public GridFSFindIterable getUnprocessedFiles() {
    return gridFSBucketFactory.getGridFSBucket()
            .find(eq("metadata.processed", false));
}

public void setProcessed(String filename, boolean isProcessed) {
    final BasicDBObject filter =
            BasicDBObject.parse(format(FILTER_STR, filename));
    final BasicDBObject update =
            BasicDBObject.parse(format(UPDATE_STR, isProcessed));
    if (updateOne(filter, update)) {
        LOGGER.info("Set metadata for {} to {}", filename, isProcessed);
    }
}

private void uploadToGridFS(String filename, DateTime publishTime,
        InputStream inputStream) {
    gridFSBucketFactory.getGridFSBucket().uploadFromStream(filename,
            inputStream, createMetadata(publishTime));
}

private GridFSUploadOptions createMetadata(DateTime publishTime) {
    final Document metadata = new Document();
    metadata.put("processed", false);
    // metadata.put("publishTime", publishTime.toString());
    return new GridFSUploadOptions().metadata(metadata);
}

private boolean fileExists(String filename) {
    return getGridFSFile(filename) != null;
}

private GridFSFile getGridFSFile(String filename) {
    return gridFSBucketFactory.getGridFSBucket()
            .find(eq("filename", filename)).first();
}

private boolean updateOne(BasicDBObject filter, BasicDBObject update) {

    try {
        mongoCollectionFactory.getFsFilesCollection().updateOne(filter,
                update, new UpdateOptions().upsert(true));
    } catch (final MongoException e) {
        LOGGER.error(
                "The following failed to update, filter:{0} update:{1}",
                filter, update, e);
        return false;
    }
    return true;
}

Any idea what I can do to ensure:

GridFSBucket.find(eq("metadata.processed", false) 

returns the proper results for existing files and/or files that have had the metadata changed?


Solution

  • The issue was due to setting the metadata.processed value as a String vs a boolean.

    When initially creating the metadata I set its value with a boolean:

    private GridFSUploadOptions createMetadata(DateTime publishTime) {
        final Document metadata = new Document();
        metadata.put("processed", false);
        // metadata.put("publishTime", publishTime.toString());
        return new GridFSUploadOptions().metadata(metadata);
    }
    

    And later I check for a boolean:

    public GridFSFindIterable getUnprocessedFiles() {
        return gridFSBucketFactory.getGridFSBucket()
            .find(eq("metadata.processed", false));
    }
    

    But when updating the metadata using the "fs.files" MongoCollection I incorrectly added quotes around the boolean value here:

    private static final String UPDATE_STR =
        "'{'\"$set\": '{'\"metadata.processed\": \"{0}\"'}}'";
    

    Which caused the metadata value to be saved as a String vs a boolean.