Ceph RADOS linearizable?

David Disseldorp ddiss at samba.org
Wed Mar 8 20:38:30 UTC 2023


On Wed, 8 Mar 2023 19:18:54 +0100, Ralph Boehme via samba-technical wrote:

> Hi David,
> 
> On 3/8/23 18:47, David Disseldorp wrote:
> > This a question better suited to the Ceph development list, but I'll do
> > my best to try to answer...  
> 
> you're right. Sorry for being lazy and trying to shortcut... :)))

Well, just keep in mind you'll get a much more informed answer there :)

> >> Can you confirm whether RADOS is indeed Linearizabile? I'm pretty sure
> >> it is, but would like to be sure. :)  
> > 
> > RADOS is a very broad interface when considering linearizability, but
> > if you choose to focus on key/value storage accessed via the Ceph omap
> > interface, then yes, my understanding is that OSD requests for a single
> > object are processed in a way that provides atomic consistency from a
> > RADOS client perspective.  
> 
> well, atomicity is one point related to the single operations or 
> transactions, consistency is a broader concept dealing with the ordering 
> and relation between different operations.
> 
> With linearizable consistency, which is the strongest consistency you 
> can get with single-key operations, you're guaranteed that operations 
> appear in an order consistent with the real-time ordering of those 
> operations. Which as another way to say that for
> 
> Time 1: Client 1: set A to X
> Time 2: Client 2: get A -> ?
> 
> with linearizable consistency it's guaranteed that client 2 reads "X". 
> Which is not the case with weaker consistency levels where the client is 
> allowed to see the old value (whatever that was).

I think it's easier to express this as a protocol sequence diagram. As I
understand things, Ceph omap will provide such guarantees for OSD
acknowledged requests, assuming perfect time synchronisation between
client clocks e.g.

t  Client         Client               primary             replica
.    1               2                   OSD                OSDs
.    |               |                    |                  |
1    |>set_omap(A=X)----->--------------->|                  |
.    |               |                    |   set_omap(A=X)  |
.    |               |                    |>-------->------->|
.    |               |                    |                  |
.    |               |                    |<---ack--<-------<|
.    |               |                    |                  |
.    |<---ack------------<---------------<|                  |
.    |               |                    |                  |
2    |               |>get_omap(A)------->|                  |
.    |               |                    |                  |
.    |               |<-ack(A=X)---------<|                  |
.    |               |                    |                  |

IIUC, if Client 2 above were to send its request before Client 1
received the set_omap acknowledgement, then the get_omap response would
either be A=X *or* the earlier value of A, although nothing in between.

Cheers, David



More information about the samba-technical mailing list