Ceph RADOS linearizable?
Ralph Boehme
slow at samba.org
Thu Mar 9 09:43:13 UTC 2023
Hi David,
On 3/8/23 21:38, David Disseldorp wrote:
> I think it's easier to express this as a protocol sequence diagram. As I
> understand things, Ceph omap will provide such guarantees for OSD
> acknowledged requests, assuming perfect time synchronisation between
> client clocks e.g.
>
> t Client Client primary replica
> . 1 2 OSD OSDs
> . | | | |
> 1 |>set_omap(A=X)----->--------------->| |
> . | | | set_omap(A=X) |
> . | | |>-------->------->|
> . | | | |
> . | | |<---ack--<-------<|
> . | | | |
> . |<---ack------------<---------------<| |
> . | | | |
> 2 | |>get_omap(A)------->| |
> . | | | |
> . | |<-ack(A=X)---------<| |
> . | | | |
>
> IIUC, if Client 2 above were to send its request before Client 1
> received the set_omap acknowledgement, then the get_omap response would
> either be A=X *or* the earlier value of A, although nothing in between.
thanks! Yes, this is how replication looks like in systems that uses
replication system with a strong leader. :)
The problem is not that I wouldn't understand how distributed databases
and related distributed consensus algorithms work, I was basically just
wondering why it's not documented which consistency is provided using
librados as k/v store. Given the proliferation of distributed databases
these days which provided different consistency levels, I'd love to see
products clearly document what you can expect. So I guess I should have
really brought this up on ceph-devel and not here, as I'm barking up the
wrong tree. :)))
Digging some more I found
https://repositorio-aberto.up.pt/bitstream/10216/139563/2/529181.pdf
which states that RADOS uses Multi Paxos, which implies Linearizability.
But this one here still makes me nervous
https://tracker.ceph.com/issues/50719. Here the client suddenly sees an
old value, something you'd expect from a Dynamo style Eventual
Consistency database (Cassandra, Scylla, Yugabyte, ...), but not from a
linearizable one.
Thanks!
-slow
--
Ralph Boehme, Samba Team https://samba.org/
SerNet Samba Team Lead https://sernet.de/en/team-samba
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 840 bytes
Desc: OpenPGP digital signature
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20230309/c6ac11e6/OpenPGP_signature.sig>
More information about the samba-technical
mailing list