Bugfix for tdb transactions

tridge at samba.org tridge at samba.org
Sun Jan 31 19:49:23 MST 2010


Hi Simo,

 > For example we could make the first process to open a tdb file to do a
 > lock on a specific bit of data, so that any other opener will know the
 > file is shared.

That is what TDB_CLEAR_IF_FIRST does, for non-persistent databases.

I removed the use TDB_CLEAR_IF_FIRST in s4 quite a while ago as it is
terrible for scalability. Many OSes (including Linux) use a linear
list of locks per inode. So each time you lock/unlock a single range
in a file, it needs to walk the entire list.

If you look at http://samba.org/ftp/unpacked/junkcode/lock_scaling.c
you'll see a demo of this. On my Linux laptop I see a 1000x slowdown
in locking between having 1 lock on a file and having 10000 locks on a
file. The slowdown is much worse on some other platforms. On systems
like zLinux the VM context switching makes the cost of this list
extremely high (I've seen zLinux print servers where this locking cost
completely dominated the CPU usage of the system).

s3 still uses TDB_CLEAR_IF_FIRST for non-persistent databases, but I
wouldn't want to enshrine this in the code forever. 

(there was some discussion of changing the Linux kernel to use
red-black trees for locking to solve this scaling problem, but it
hasn't happened).

 > At the same time we set up an inotify watch to know from
 > the first process if any other process opens the tdb.

I don't think we want to use inotify for this sort of thing. I'd far
rather just get the locking right.

Cheers, Tridge


More information about the samba-technical mailing list