Let say I want to achieve maximum useable capacity with data resilience on this 3 OSD nodes setup where each node contains 2x 1TB OSDs.
Is it safe run 3 Ceph nodes with 2-way replication?
What are the pros and cons of using 2-way? Will it cause data split-brain?
Last but not least, what domain fault tolerance will it be running on 2-way replication?
Thanks!
Sometimes, even three replica is not enough, e.g. if ssd disks (from cache) fail together or one by one.
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-October/005672.html
For two osd you can even set manually 1 replica for minimum and 2 replicas for maximum (I didn't managed to set it automatically in the case of one failed osd of all three osds):
osd pool default size = 2
# Write an object 2 times
osd pool default min size = 1
# Allow writing 1 copy in a degraded state
But this command: ceph osd pool set mypoolname set min_size 1
sets it for a pool, not just the default settings.
For n = 4
nodes each with 1 osd and 1 mon and settings of replica min_size 1
and size 4
three osd can fail, only one mon can fail (the monitor quorum means more than half will survive). 4 + 1
number of monitors is required for two failed monitors (at least one should be external without osd). For 8
monitors (four external monitors) three mon can fail, so even three nodes each with 1
osd and 1
mon can fail. I am not sure that setting of 8
monitors is possible.
Thus, for three nodes each with one monitor and osd the only reasonable settings are replica min_size 2
and size 3
or 2. Only one node can fail.
If you have an external monitors, if you set min_size
to 1
(this is very dangerous) and size
to 2
or 1
the 2
nodes can be down. But with one replica (no copy, only original data) you can loose your job very soon.