[Samba] ctdb vacuum timeouts and record locks

Computerisms Corporation bob at computerisms.ca
Fri Oct 27 17:09:56 UTC 2017


Hi Martin,

Thanks for reading and taking the time to reply

>> ctdbd[89]: Unable to get RECORD lock on database locking.tdb for 20 seconds
>> /usr/local/samba/etc/ctdb/debug_locks.sh: 142:
>> /usr/local/samba/etc/ctdb/debug_locks.sh: cannot create : Directory
>> nonexistent
>> sh: echo: I/O error
>> sh: echo: I/O error
> 
> That's weird.  The only file really created by that script is the lock
> file that is used to make sure we don't debug locks too many times.
> That should be in:
> 
>    "${CTDB_SCRIPT_VARDIR}/debug_locks.lock"

Next time it happens I will check this.

> The other possibility is the use of the script_log() function to try to
> get the output logged.  script_log() isn't my greatest moment.  When
> debugging you could just replace it with the logger command to get the
> output out to syslog.

Okay, that sounds useful, will see what I can do next time I see the 
problem...

>> My setup is two servers, the OS is debian and is running samba AD on
>> dedicated SSDs, and each server has a RAID array of HDDs for storage,
>> with a mirrored GlusterFS running on top of them.  Each OS has an LXC
>> container running the clustered member servers with the GlusterFS
>> mounted to the containers.  The tdb files are in the containers, not on
>> the shared storage.  I do not use ctdb to start smbd/nmbd.  I can't
>> think what else is relevant about my setup as it pertains to this issue...
> 
> Are the TDB files really on a FUSE filesystem?  Is that an artifact of
> the LXC containers?  If so, could it be that locking isn't reliable on
> the FUSE filesystem?

No.  The TDB files are in the container, and the container is on the SSD 
with the OS.  running mount from within the container shows:

/dev/sda1 on / type ext4 (rw,relatime,errors=remount-ro,data=ordered)

However, the gluster native client is a fuse-based system, so the data 
is stored on a fuse system which is mounted in the container:

masterchieflian:ctfngluster on /CTFN type fuse.glusterfs 
(rw,relatime,user_id=0,group_id=0,allow_other,max_read=131072)

Since this is where the files that become inaccessible are, perhaps this 
is really where the problem is, and not with the locking.tdb file?  I 
will investigate about file locks on the gluster system...

> Is it possible to try this without the containers?  That would
> certainly tell you if the problem is related to the container
> infrastructure...

I like to think everything is possible, but it's not really feasible in 
this case.  Since there are only two physical servers, and they need to 
be running AD, the only way to separate the containers now is with 
additional machines to act as member servers.  And because everything 
tested fine and actually was fine for at least two weeks, these servers 
are in production now and have been for a few months.  If I have to go 
this way, it will certainly be a last resort...

Thanks again for your reply, will get back to you with what I find...




> 
> peace & happiness,
> martin
> 



More information about the samba mailing list