Search code examples
checksumceph

Ceph Bluestore checksums: What's the word on bitrot?


I'm getting ready to setup my first Ceph cluster (Luminous on Fedora) for production use, and thus far I've gone through the process of running a single OSD per node on a large ZFS pool so I have checksum-on-read bitrot protection with automatic repair (when possible).

The reason I've done this is because everything I've read is that Ceph doesn't really have bitrot protection in mind as one of its goals, including with Bluestore. Deep scrubbing works, but obviously has a heavy performance hit while running and more importantly, creates a window of time during which corrupt data can be read.

Today, though, I've read a few things about Bluestore around checksum-on-read that suggest I may have been incorrect. I cannot, however, find any documentation that seems to say authoritatively "this is what this does".

So hopefully this is a good outlet to ask: Can anybody speak with confidence on whether or not Bluestore provides bitrot detection and, with the help of other OSDs, automatic repair through its checksum mechanism?


Solution

  • BlueStore very much has bitrot protection as one of its goals. It stores checksums for every block and validates them on reads. If they’re bad, it throws errors rather than returning known-bad data; that triggers the higher-level RADOS recovery mechanisms.