[Linux-cluster] Samba Technical thread
Daniel Phillips
phillips at redhat.com
Fri Dec 10 21:08:56 GMT 2004
Hi Chris,
> > 2. server role (symmetric vs asymmetric)
> > GFS aims to be server-less, NFS/CIFS are very server-based, and
> > others can fall somewhere in between. (If you consider using
> > GFS above iscsi or nbd then the differences become even more
> > subtle.)
>
> I think it's the second point that's key. Each node in the GFS
> cluster is it's own client and server, interacting with the other
> nodes that also have the device mounted.
Point 2 is potentially misleading. A better way to put it:
2. Uses global synchronization
GFS uses a global lock manager that allows each node to synchronize
access directly to the data device and implement per-node caching
with accurate semantics.
Whether the lock manager is symmetric, server based or hierarchical is
irrelevant. The point is, when you have global locks you can distribute the
server work (i.e., have multiple Samba servers working from the same
underlying filesystem) instead of funnelling everything through a single
machine.
We now have three different GFS subsystems that are server-based rather than
symmetric: cluster snapshot, cluster mirror and gnbd network block device.
And the production version of GFS uses a server-based lock manager. In
fact, even the "symmetric" gdlm will sometimes temporarily appoint a server
to take care of a particularly complex task like recovery from a failed
node that was mastering some locks.
The point of this is, if you can't address your synchronization problem
using a server, then you certainly can't address it using a distributed
system. It's a case of walking before running. If you start off by
getting tangled in distributed database and multiple master issues then
that will be your whole project: solving distributed database issues, and
you will never get to the actual problem, which is synchronizing Samba
servers across multiple nodes.
So I humbly suggest that you design a server-based implementation first,
get it working, then think about how you might go about distributing the
database, or if that is even wise. The more complex the database, the less
wise it is.
Just to clarify, when I talk about a synchronization server here, I am
talking about a server that provides synchronization services to multiple
Samba servers. After all, the whole point of the exercise is to distribute
the Samba server. In solving that problem, you don't want to be solving
the problem of how to distribute the synchronization server as well. In
fact, that's exactly how GFS progressed: the lock manager currently in
production is a single server with hot standbys. It's only the alpha
version in 2.6 that has the distributed lock manager. It took years to
move from one to the other, and actually, the move isn't finished yet.
I hope I'm succeeding in communicating the message that by distributing the
shared database, you exacerbate the combinatorics of an already
challenging situation. A distributed database, after all, only consists of
a multiple servers, each serving a part of the database domain, and
requires an additional layer of synchronization between servers, plus an
additional layer of recovery mechanisms. You end of with two layers of
cluster synchronization: one layer consists of the (symmetric) Samba
servers, and the other layer consistions of synchronization servers for the
Samba servers. If your database domain is simple and flat like a lock
space, this isn't too horribly bad, but if it's a tree (say) then
distributing it gets ugly very quickly.
To put this in concrete terms, consider how you might go about sharing a
tdb between multiple Samba servers. You can frontend the tdb with a
message handler that receives name lookups, creates or delete requests and
accesses/updates the database accordingly. You can put in some extra
effort and have the server keep a list of which Samba server has requested
each name, so that the Samba servers can cache names locally and the server
can send out invalidation messages if somebody wants to change a name.
Pretty easy, right? Now consider the problem of rewriting tdb to be a
distributed database so that each Samba server accesses the shared disk
directly, and uses global locking to take care of parallel access and
caching issues. If this doesn't scare you, it should.
Notice that, after having gone to all the work of distributing your
database, you probably didn't save anything versus the simple message
dispatcher, since you still need flocks of messages flying around to do the
database synchronization.
By the way, you don't actually need a distributed lock manager at all for
implementing cluster synchronization. I have demonstrate this by
implementing a cluster snapshot block device and a cluster mirror using
only messages for synchronization, no global locks at all. This is
obviously possible, because the distributed lock manager itself is
implemented using only messages. Messages are actually better than global
locks for many problems (and I suspect, for your specific problem) because
messages can carry an arbitrary amount of state information at the same
time as acting as synchronizers. If you try to do the same thing using
global locks are synchronizers, you typically find yourself doing:
1. Grab the global lock
2. Go get the data
3. Release the global lock
All three steps send messages and receive confirmations. Using messages
directly, you have:
1. Go get the data.
This is 3 times fewer messages. In many cases, you don't even have to ask
for the data, it just arrives because the server knows you need it, so this
is six times fewer messages than the dlm style synchronization. Not only
that, but you don't have to worry about keeping the locks synchronized with
the data, which is messier than it sounds.
I hope the above is helpful in thinking about this effort in concrete terms.
To get even more concrete, I suggest you make a list of what data you need
to synchronize between Samba servers (e.g., case translation database;
Windows-specific locks). It might be helpful to make another list of what
GFS already synchronizes for you (e.g., directory namespace; file data).
Regards,
Daniel
More information about the samba-technical
mailing list