[Samba] Problems with TDBs on CTDB-managed Samba instance

Howard, Stewart Jameson sjhoward at iu.edu
Fri Oct 16 14:44:36 UTC 2015


Hi All,


My site has two separate clustered Samba instances (managed by two independent CTDB instances) running over GPFS.  In the last couple of weeks, we have seen a recurring issue that causes the `smbd` process in *one* of these instances to become unresponsive (as seen by CTDB), which results in flapping of CTDB and multiple IP takeover runs.


The symptoms that we observe are:


1)  Samba becomes unresponsive


2)  The output of `smbstatus` starts to show "-1" for each connection where it should be showing user/group information.


3)  Samba starts terminating connected sessions and CTDB kills its IP address


4)  After some thrashing (Samba restarts, presumably), CTDB is able to recover and start serving again


We have noticed that the following messages have started appearing in syslog, as well as in the winbind log on the afflicted cluster:


"""

[2015/10/16 10:25:30.892468,  0] ../source3/lib/util_tdb.c:313(tdb_log)
  tdb(<PATH OMMITTED>/gencache_notrans.tdb): tdb_rec_read bad magic 0xd9fee666 at offset=517632


[2015/10/16 10:25:37.827964,  0] ../source3/lib/util_tdb.c:313(tdb_log)
  tdb(<PATH OMMITTED>/gencache_notrans.tdb): tdb_expand overflow detected current map_size[4294967295] size[124]!

"""


These messages appear in *great* number, especially the message about "tdb_expand overflow detected."  Interestingly, the size of the file it mentions is the exact size in bytes as the presumed array reference index that the error message lists:


"""

[root@<HOST> lock]# ll gencache_notrans.tdb
-rw-r--r-- 1 root root 4294967295 Oct 16 10:39 gencache_notrans.tdb

"""


On the Samba cluster that is problem-free, this file is a mere ~500K:


"""

[root at rsgwb2 lock]# ll gencache_notrans.tdb
-rw-r--r-- 1 root root 528384 Oct 16 10:40 gencache_notrans.tdb

"""


Although we poorly understand the cause of the current issue, our suspicion is that it relates somehow to the enormous size of gencache_notrans.tdb.


Can anybody comment on what this file is for?  Looking at


https://www.samba.org/samba/docs/man/Samba-HOWTO-Collection/install.html#tdbdocs


I see no description of this file, only of gencache.tdb.  Also, if anyone has experience with this type of issue or insight into it, your help is greatly appreciated  :)


Thank you so much for your time!


Stewart Howard


More information about the samba mailing list