tdb and file-per-hash-chain

Wed Feb 1 22:24:33 GMT 2006

Volker,

 > A file per key might be excessive, but nicely offloads all the lock management
 > etc into the cluster fs, so you can point at someone else if your performance
 > sucks :-)

I know you are partly joking, but you need to also think about whether
a file per key would be fast for the simplest case of a single
node. 

When designing a cluster solution you need to have a design that will
run acceptably fast when run on a local filesystem without
clustering. If it doesn't run acceptably fast in that case then you
have little chance of getting good performance when you have it spread
across multiple nodes. 

To test this, you could hack tdb to do fake file IOs per call, then
run this on a local filesystem and benchmark the result. My guess is
that it will be pretty slow. If you can show that it isn't then maybe
the file per key method is worth trying in a cluster.

Also note that it still suffers from the same problem I mentioned to
James. When you do have contention (multiple clients operating on the
same file) then you will be hitting the same key on multiple nodes in
the cluster. That will raise the same bad performance problems you are
trying to avoid.

 > In the clustered case we have the advantage that for the non-contended case the
 > share mode entry creator is the one who has to do business with that, ideally
 > during the file handle lifetime this does not need to be migrated anywhere
 > else. This puts high stress on the directory and inode creation code, but this
 > is the design space we have to explore and possibly adapt to different
 > clustered file system.

As I think I've mentioned to you before, I think that the clustered
tdb approach is only good as a proof of concept. Once past that stage
I think you must move the knowledge of clustering up a level, so that
you have a clustered solution for share modes, rather than a clustered
solution for tdb. The tdb model is pretty good for a local filesystem,
but it was designed knowing what the relative costs of system calls
are on a local filesystem. The design looks pretty bad when you change
those relative costs, as happens for the clustered case.

 > Except its C++, but this might be something we can live with for a clustered
 > file server.

hmm, sorry, I didn't know it was C++. I guess it depends how C++-ish
it is ....

Cheers, Tridge