Search code examples
ceph

Ceph: What happens when enough disks fail to cause data loss?


Imagine that we have enough disk failures in Ceph to cause actual loss of the data. (E.g. all 3 replicas fail in 3-replica; or >m fail in k+m erasure coding). What happens now?

  1. Is the cluster still stable? That is, of course we've lost that data, but will other data, and new data, still work well.

  2. Is there any way to get a list of the lost object ids?

In our use case, we could recover the lost data from offline backups. But, to do that, we'd need to know which data was actually lost - that is, get a list of the object ids that were lost.


Solution

  • Answer 1: what happens if ?

    Ceph distributes your data in placement groups (PGs). Think of them as shards of your data pool. By default a PG is stored in 3 copies over your storage devices. Again by default a minimum of 2 copies have to be known to exist by ceph to be still accessible. Should only 1 copy be available (because 2 OSDs (aka disks) are offline), writes to that PG are blocked until the minimum number of copies (2) are online again. If all copies of a PG are offline your reads will be blocked until one copy comes online. All other PGs are free to be accessed if online with enough copies.

    Answer 2: what is affected ?

    You are probably referring to the S3 like object storage. This is modelled on top of the rados object store, that is the key storage of ceph. Problematic PGs can be traced and associated with the given rados object. There is documentation about identifying blocked RadosGW requests and another section about getting from defective PGs to the rados objects.