I would like to backup a running rocksdb-instance to a location on the same disk in a way that is safe, and without interrupting processing during the backup.
I have read:
My question is whether the call to CreateNewBackupWithMetadata
is marked as NOT threadsafe to express, that two concurrent calls to this function will have unsafe behavior, or to indicate that ANY concurrent call on the database will be unsafe. I have checked the implementation, which appears to be creating a checkpoint - which the second article claims are used for online backups of MyRocks -, but I am still unsure, what part of the call is not threadsafe.
I currently interpret this as, it is unsafe, because CreateBackup...
calls DisableFileDeletions
and later EnableFileDeletions
, which, of course, if two overlapping calls are made, may cause trouble. Since the SST-files are immutable, I am not worried about them, but am unsure whether modifying the WAL through insertions can corrupt the backup. I would assume that triggering a flush on backup should prevent this, but I would like to be sure.
Any pointers or help are appreciated.
I ended up looking into the implementation way deeper, and here is what I found:
Recall a rocksdb database consists of Memtables, SSTs and a single WAL, which protects data in the Memtables against crashes.
When you call rocksdb::BackupEngine::CreateBackupWithMetadata
, there is no lock taken internally, so this call can race, if two calls are active at the same time. Most notably this call does Disable/EnableFileDeletions
, which, if called by one call, while another is still active spells doom for the other call.
The process of copying the files from the database to the backup is protected from modifications while the call is active by creating a rocksdb::Checkpoint
, which, if flush_before_backup
was set to true, will first flush the Memtables, thus clearing the active WAL.
Internally the call to CreateCustomCheckpoint
calls DB::GetLiveFiles
in db_filecheckpoint.cc
. GetLiveFiles
takes the global database lock (_mutex
), optionally flushes the Memtables, and retrieves the list of SSTs. If a flush in GetLiveFiles
happens while holding the global database-lock, the WAL must be empty at this time, which means the list should always contain the SST-files representing a complete and consistent database state from the time of the checkpoint. Since the SSTs are immutable, and since file deletion through compaction is turned off by the backup-call, you should always get a complete backup without holding writes on the database. However this, of course, means it is not possible to determine the exact last write/sequence number in the backup when concurrent updates happen - at least not without inspecting the backup after it has been created.
For the non-flushing version, there maybe WAL-files, which are retrieved in a different call than GetLiveFiles
, with no lock held in between, i.e. these are not necessarily consistent, but I did not investigate further, since the non-flushing case was not applicable to my use.