Search code examples
cephradosgw

ceph. what osd selected to actual return the data from it (ceph logic)


I use nginx->radosgw->ceph cluster where every piece of placed data lays on 3 osd simultaneously (each osd is a separate osd server) whole cluster contains 9 osd servers. Ceph v10 (if this matters)

Say, my piece of data is a small file of 5KiB. Cluster in OK state.

QUESTION: When I GET (request) my piece of data form my cluster via nginx->radosgw what OSD selected to recieve actual information from the SSD disks?

  1. Is this ONE "main" OSD which returns whole 5KiB of data ?

  2. Is this ALL 3 OSDs that holds this piece which returns whole 5KiB of data from every 3 OSD simultaneously ?

  3. Is this ANY but one of the 3 OSDs that holds this piece 5KiB of data which selected to return actual data, and data can be returned from any of the 3 OSDs but only from ONE selected OSD will return whole 5KiB of data ?

  4. Is this ANY of the 3 OSD that holds this piece of 5KiB data which returns, say, osd1 returns 1KiB + osd2 returns 3KiB + osd8 returns 1KiB = 5KiB in total?

What is the logic?

Thanks for you patience with reading options above. Thanks in advance for the answers.


Solution

  • QUESTION: When I GET (request) my piece of data form my cluster via nginx->radosgw what OSD selected to recieve actual information from the SSD disks?

    The client always addresses read and write requests to the primary OSD. For the rest of the work needed, the primary OSD is reponsible.

    So in case of a replicated pool, the primary OSD will be replying to the request directly with only relying on it's local storage. The whole object will be read from primary OSD, no other OSD will be involved.

    In case of erasure coded pools, the client will request the data from primary OSD as well, when the primary OSD has received all data chunks from the other involved OSDs, the primary OSD will serve the object in total to the client. In case of missing chunks, the primary OSD will also query the parity chunks in order to decode the data.