[Linux-cluster] Samba Technical thread

Fri Dec 10 21:08:56 GMT 2004

Hi Chris,

> > 2. server role (symmetric vs asymmetric)
> >    GFS aims to be server-less, NFS/CIFS are very server-based, and
> >    others can fall somewhere in between.  (If you consider using
> > GFS above iscsi or nbd then the differences become even more
> > subtle.)
>
> I think it's the second point that's key.  Each node in the GFS
> cluster is  it's own client and server, interacting with the other
> nodes that also have the device mounted.

Point 2 is potentially misleading.  A better way to put it:

2.  Uses global synchronization
    GFS uses a global lock manager that allows each node to synchronize
    access directly to the data device and implement per-node caching
    with accurate semantics.

Whether the lock manager is symmetric, server based or hierarchical is
irrelevant.  The point is, when you have global locks you can distribute the 
server work (i.e., have multiple Samba servers working from the same 
underlying filesystem) instead of funnelling everything through a single 
machine.

We now have three different GFS subsystems that are server-based rather than 
symmetric: cluster snapshot, cluster mirror and gnbd network block device.  
And the production version of GFS uses a server-based lock manager.  In 
fact, even the "symmetric" gdlm will sometimes temporarily appoint a server 
to take care of a particularly complex task like recovery from a failed 
node that was mastering some locks.

The point of this is, if you can't address your synchronization problem 
using a server, then you certainly can't address it using a distributed 
system.  It's a case of walking before running.  If you start off by 
getting tangled in distributed database and multiple master issues then 
that will be your whole project: solving distributed database issues, and 
you will never get to the actual problem, which is synchronizing Samba 
servers across multiple nodes.

So I humbly suggest that you design a server-based implementation first, 
get it working, then think about how you might go about distributing the 
database, or if that is even wise.  The more complex the database, the less 
wise it is.

Just to clarify, when I talk about a synchronization server here, I am 
talking about a server that provides synchronization services to multiple 
Samba servers.  After all, the whole point of the exercise is to distribute 
the Samba server.  In solving that problem, you don't want to be solving 
the problem of how to distribute the synchronization server as well.  In 
fact, that's exactly how GFS progressed: the lock manager currently in 
production is a single server with hot standbys.  It's only the alpha 
version in 2.6 that has the distributed lock manager.  It took years to 
move from one to the other, and actually, the move isn't finished yet.

I hope I'm succeeding in communicating the message that by distributing the 
shared database, you  exacerbate the combinatorics of an already 
challenging situation.  A distributed database, after all, only consists of 
a multiple servers, each serving a part of the database domain, and 
requires an additional layer of synchronization between servers, plus an 
additional layer of recovery mechanisms.  You end of with two layers of 
cluster synchronization: one layer consists of the (symmetric) Samba 
servers, and the other layer consistions of synchronization servers for the 
Samba servers.  If your database domain is simple and flat like a lock 
space, this isn't too horribly bad, but if it's a tree (say) then 
distributing it gets ugly very quickly.

To put this in concrete terms, consider how you might go about sharing a 
tdb between multiple Samba servers.  You can frontend the tdb with a 
message handler that receives name lookups, creates or delete requests and 
accesses/updates the database accordingly.  You can put in some extra 
effort and have the server keep a list of which Samba server has requested 
each name, so that the Samba servers can cache names locally and the server 
can send out invalidation messages if somebody wants to change a name.  
Pretty easy, right?  Now consider the problem of rewriting tdb to be a 
distributed database so that each Samba server accesses the shared disk 
directly, and uses global locking to take care of parallel access and 
caching issues.  If this doesn't scare you, it should.

Notice that, after having gone to all the work of distributing your 
database, you probably didn't save anything versus the simple message 
dispatcher, since you still need flocks of messages flying around to do the 
database synchronization.

By the way, you don't actually need a distributed lock manager at all for 
implementing cluster synchronization.  I have demonstrate this by 
implementing a cluster snapshot block device and a cluster mirror using 
only messages for synchronization, no global locks at all.  This is 
obviously possible, because the distributed lock manager itself is 
implemented using only messages.  Messages are actually better than global 
locks for many problems (and I suspect, for your specific  problem) because 
messages can carry an arbitrary amount of state information at the same 
time as acting as synchronizers.  If you try to do the same thing using 
global locks are synchronizers, you typically find yourself doing:

  1. Grab the global lock
  2. Go get the data
  3. Release the global lock

All three steps send messages and receive confirmations.  Using messages 
directly, you have:

  1. Go get the data.

This is 3 times fewer messages.  In many cases, you don't even have to ask 
for the data, it just arrives because the server knows you need it, so this 
is six times fewer messages than the dlm style synchronization.  Not only 
that, but you don't have to worry about keeping the locks synchronized with 
the data, which is messier than it sounds.

I hope the above is helpful in thinking about this effort in concrete terms.  
To get even more concrete, I suggest you make a list of what data you need 
to synchronize between Samba servers (e.g., case translation database; 
Windows-specific locks).  It might be helpful to make another list of what 
GFS already synchronizes for you (e.g., directory namespace; file data).

Regards,

Daniel