[Samba] ctdb vacuum timeouts and record locks
Computerisms Corporation
bob at computerisms.ca
Thu Nov 2 18:17:27 UTC 2017
Hi,
This occurred again this morning, when the user reported the problem, I
found in the ctdb logs that vacuuming has been going on since last
night. The need to fix it was urgent (when isn't it?) so I didn't have
time to poke around for clues, but immediately restarted the lxc
container. But this time it wouldn't restart, which I had time to trace
to a hung smbd process, and between that and a run of the debug_locks.sh
script, I traced it to the user reporting the problem. Given that the
user was primarily having problems with files in a given folder, I am
thinking this is because of some kind of lock on a file within that
folder.
Ended up rebooting both physical machines, problem solved. for now.
So, not sure how to determine if this is a gluster problem, an lxc
problem, or a ctdb/smbd problem. Thoughts/suggestions are welcome...
On 2017-10-27 10:09 AM, Computerisms Corporation via samba wrote:
> Hi Martin,
>
> Thanks for reading and taking the time to reply
>
>>> ctdbd[89]: Unable to get RECORD lock on database locking.tdb for 20
>>> seconds
>>> /usr/local/samba/etc/ctdb/debug_locks.sh: 142:
>>> /usr/local/samba/etc/ctdb/debug_locks.sh: cannot create : Directory
>>> nonexistent
>>> sh: echo: I/O error
>>> sh: echo: I/O error
>>
>> That's weird. The only file really created by that script is the lock
>> file that is used to make sure we don't debug locks too many times.
>> That should be in:
>>
>> "${CTDB_SCRIPT_VARDIR}/debug_locks.lock"
>
> Next time it happens I will check this.
>
>> The other possibility is the use of the script_log() function to try to
>> get the output logged. script_log() isn't my greatest moment. When
>> debugging you could just replace it with the logger command to get the
>> output out to syslog.
>
> Okay, that sounds useful, will see what I can do next time I see the
> problem...
>
>>> My setup is two servers, the OS is debian and is running samba AD on
>>> dedicated SSDs, and each server has a RAID array of HDDs for storage,
>>> with a mirrored GlusterFS running on top of them. Each OS has an LXC
>>> container running the clustered member servers with the GlusterFS
>>> mounted to the containers. The tdb files are in the containers, not on
>>> the shared storage. I do not use ctdb to start smbd/nmbd. I can't
>>> think what else is relevant about my setup as it pertains to this
>>> issue...
>>
>> Are the TDB files really on a FUSE filesystem? Is that an artifact of
>> the LXC containers? If so, could it be that locking isn't reliable on
>> the FUSE filesystem?
>
> No. The TDB files are in the container, and the container is on the SSD
> with the OS. running mount from within the container shows:
>
> /dev/sda1 on / type ext4 (rw,relatime,errors=remount-ro,data=ordered)
>
> However, the gluster native client is a fuse-based system, so the data
> is stored on a fuse system which is mounted in the container:
>
> masterchieflian:ctfngluster on /CTFN type fuse.glusterfs
> (rw,relatime,user_id=0,group_id=0,allow_other,max_read=131072)
>
> Since this is where the files that become inaccessible are, perhaps this
> is really where the problem is, and not with the locking.tdb file? I
> will investigate about file locks on the gluster system...
>
>> Is it possible to try this without the containers? That would
>> certainly tell you if the problem is related to the container
>> infrastructure...
>
> I like to think everything is possible, but it's not really feasible in
> this case. Since there are only two physical servers, and they need to
> be running AD, the only way to separate the containers now is with
> additional machines to act as member servers. And because everything
> tested fine and actually was fine for at least two weeks, these servers
> are in production now and have been for a few months. If I have to go
> this way, it will certainly be a last resort...
>
> Thanks again for your reply, will get back to you with what I find...
>
>
>
>
>>
>> peace & happiness,
>> martin
>>
>
More information about the samba
mailing list