Search code examples
dexie

How to append to dexie entry using a rolling buffer (to store large entries without allocating GBs of memory)


I was redirected here after emailing the author of Dexie (David Fahlander). This is my question:

Is there a way to append to an existing Dexie entry? I need to store things that are large in dexie, but I'd like to be able to fill large entries with a rolling buffer rather than allocating one huge buffer and then doing a store.

For example, I have a 2gb file I want to store in dexie. I want to store that file by storing 32kb at a time into the same store, without having to allocate a 2gb of memory in the browser. Is there a way to do that? The put method seems to only overwrite entries.


Solution

  • Thanks for putting your question here at stackoverflow :) This helps me build up an open knowledge base for everyone to access.

    There's no way in IndexedDB to update an entry without also instanciating the whole entry. Dexie adds the update() and modify() methods, but they only emulate a way to alter certain properties. In the background, the entire document will always be loaded in memory temporarily.

    IndexedDB also has Blob support, but when a Blob i stored into IndexedDB, its entire content is cloned/copied into the database by specification.

    So the best way to deal with this would be to dedicate a table for dynamic large content and add new entries to it.

    For example, let's say you have a the tables "files" and "fileChunks". You need to incrementially grow the "file", and each time you do that, you don't want to instanciate the entire file in memory. You could then add the file chunks as separate entries into the fileChunks table.

    let db = new Dexie('filedb');
    db.version(1).stores({
        files: '++id, name',
        fileChunks: '++id, fileId'
    });
    
    /** Returns a Promise with ID of the created file */
    function createFile (name) {
        return db.files.add({name});
    }
    
    /** Appends contents to the file */
    function appendFileContent (fileId, contentToAppend) {
        return db.fileChunks.add ({fileId, chunk: contentToAppend});
    }
    
    /** Read entire file */
    function readEntireFile (fileId) {
        return db.fileChunks.where('fileId').equals(fileId).toArray()
        .then(entries => {
            return entries.map(entry=>entry.chunk)
                .join(''); // join = Assume chunks are strings
        });
    }
    

    Easy enough. If you want appendFileContent to be a rolling buffer (with a max size and erase old content), you could add truncate methods:

    function deleteOldChunks (fileId, maxAllowedChunks) {
        return db.fileChunks.where('fileId').equals(fileId);
            .reverse() // Important, so that we delete old chunks
            .offset(maxAllowedChunks) // offset = skip
            .delete(); // Deletes all records older before N last records
    }
    

    You'd get other benefits as well, such as the ability to tail a stored file without loading its entire content into memory:

    /** Tail a file. This function only shows an example on how
     * dynamic the data is stored and that file tailing would be
     * simple to do. */
    function tailFile (fileId, maxLines) {
        let result = [], numNewlines = 0;
        return db.fileChunks.where('fileId').equals(fileId)
            .reverse()
            .until(() => numNewLines >= maxLines)
            .each(entry => {
                result.unshift(entry.chunk);
                numNewlines += (entry.chunk.match(/\n/g) || []).length;
            })
        .then (()=> {
            let lines = result.join('').split('\n')
                .slice(1); // First line may be cut off
            let overflowLines = lines.length - maxLines;
            return (overflowLines > 0 ?
                lines.slice(overflowLines) :
                lines).join('\n');
        });
    }
    

    The reason I know that chunks will come in the correct order in readEntireFile() and tailFile() is that indexedDB queries will always be retrieved in in the order of the queried column primary, but secondary in the order of the primary keys, which are auto-incremented numbers.

    This pattern could be used for other cases, like logging etc. In case the file is not string based, you would have to alter this sample a little. Specifically, don't use string.join() or array.split().