Ceph RADOS linearizable?

Ralph Boehme slow at samba.org
Thu Mar 9 09:43:13 UTC 2023


Hi David,

On 3/8/23 21:38, David Disseldorp wrote:
> I think it's easier to express this as a protocol sequence diagram. As I
> understand things, Ceph omap will provide such guarantees for OSD
> acknowledged requests, assuming perfect time synchronisation between
> client clocks e.g.
> 
> t  Client         Client               primary             replica
> .    1               2                   OSD                OSDs
> .    |               |                    |                  |
> 1    |>set_omap(A=X)----->--------------->|                  |
> .    |               |                    |   set_omap(A=X)  |
> .    |               |                    |>-------->------->|
> .    |               |                    |                  |
> .    |               |                    |<---ack--<-------<|
> .    |               |                    |                  |
> .    |<---ack------------<---------------<|                  |
> .    |               |                    |                  |
> 2    |               |>get_omap(A)------->|                  |
> .    |               |                    |                  |
> .    |               |<-ack(A=X)---------<|                  |
> .    |               |                    |                  |
> 
> IIUC, if Client 2 above were to send its request before Client 1
> received the set_omap acknowledgement, then the get_omap response would
> either be A=X *or* the earlier value of A, although nothing in between.

thanks! Yes, this is how replication looks like in systems that uses 
replication system with a strong leader. :)

The problem is not that I wouldn't understand how distributed databases 
and related distributed consensus algorithms work, I was basically just 
wondering why it's not documented which consistency is provided using 
librados as k/v store. Given the proliferation of distributed databases 
these days which provided different consistency levels, I'd love to see 
products clearly document what you can expect. So I guess I should have 
really brought this up on ceph-devel and not here, as I'm barking up the 
wrong tree. :)))

Digging some more I found 
https://repositorio-aberto.up.pt/bitstream/10216/139563/2/529181.pdf 
which states that RADOS uses Multi Paxos, which implies Linearizability.

But this one here still makes me nervous 
https://tracker.ceph.com/issues/50719. Here the client suddenly sees an 
old value, something you'd expect from a Dynamo style Eventual 
Consistency database (Cassandra, Scylla, Yugabyte, ...), but not from a 
linearizable one.

Thanks!
-slow

-- 
Ralph Boehme, Samba Team                 https://samba.org/
SerNet Samba Team Lead      https://sernet.de/en/team-samba
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 840 bytes
Desc: OpenPGP digital signature
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20230309/c6ac11e6/OpenPGP_signature.sig>


More information about the samba-technical mailing list