Ceph RADOS linearizable?
David Disseldorp
ddiss at samba.org
Wed Mar 8 20:38:30 UTC 2023
On Wed, 8 Mar 2023 19:18:54 +0100, Ralph Boehme via samba-technical wrote:
> Hi David,
>
> On 3/8/23 18:47, David Disseldorp wrote:
> > This a question better suited to the Ceph development list, but I'll do
> > my best to try to answer...
>
> you're right. Sorry for being lazy and trying to shortcut... :)))
Well, just keep in mind you'll get a much more informed answer there :)
> >> Can you confirm whether RADOS is indeed Linearizabile? I'm pretty sure
> >> it is, but would like to be sure. :)
> >
> > RADOS is a very broad interface when considering linearizability, but
> > if you choose to focus on key/value storage accessed via the Ceph omap
> > interface, then yes, my understanding is that OSD requests for a single
> > object are processed in a way that provides atomic consistency from a
> > RADOS client perspective.
>
> well, atomicity is one point related to the single operations or
> transactions, consistency is a broader concept dealing with the ordering
> and relation between different operations.
>
> With linearizable consistency, which is the strongest consistency you
> can get with single-key operations, you're guaranteed that operations
> appear in an order consistent with the real-time ordering of those
> operations. Which as another way to say that for
>
> Time 1: Client 1: set A to X
> Time 2: Client 2: get A -> ?
>
> with linearizable consistency it's guaranteed that client 2 reads "X".
> Which is not the case with weaker consistency levels where the client is
> allowed to see the old value (whatever that was).
I think it's easier to express this as a protocol sequence diagram. As I
understand things, Ceph omap will provide such guarantees for OSD
acknowledged requests, assuming perfect time synchronisation between
client clocks e.g.
t Client Client primary replica
. 1 2 OSD OSDs
. | | | |
1 |>set_omap(A=X)----->--------------->| |
. | | | set_omap(A=X) |
. | | |>-------->------->|
. | | | |
. | | |<---ack--<-------<|
. | | | |
. |<---ack------------<---------------<| |
. | | | |
2 | |>get_omap(A)------->| |
. | | | |
. | |<-ack(A=X)---------<| |
. | | | |
IIUC, if Client 2 above were to send its request before Client 1
received the set_omap acknowledgement, then the get_omap response would
either be A=X *or* the earlier value of A, although nothing in between.
Cheers, David
More information about the samba-technical
mailing list