[Samba] mutex.tdb locking errors on Solaris 10
S.Kirk at soton.ac.uk
Thu Apr 26 07:31:49 MDT 2012
We are experiencing a problem with Samba 3.6.4 on Solaris 10 update 10. This problem has only recently started since an upgrade to v3.6.3 and was still present after rebuilding to 3.6.4. We are using the version of samba packaged by OpenCSW.
>From a client perspective, the issue is manifested as intermittent very poor performance or intermittent inability to save a file to the share at all. From the server side, it appears that when this occurs it ties in with the following log message:
[2012/04/26 10:55:07.283496, 1] ../lib/util/tdb_wrap.c:65(tdb_wrap_log)
tdb(/var/opt/csw/samba/locks/mutex.tdb): tdb_lock failed on list 2 ltype=2 (Interrupted system call)
[2012/04/26 10:55:07.283893, 0] lib/util_tdb.c:72(tdb_chainlock_with_timeout_internal)
tdb_chainlock_with_timeout_internal: alarm (10) timed out for key replay cache mutex in tdb /var/opt/csw/samba/locks/mutex.tdb
[2012/04/26 10:55:07.284235, 1] lib/server_mutex.c:74(grab_named_mutex)
Could not get the lock for replay cache mutex
[2012/04/26 10:55:07.284611, 1] libads/kerberos_verify.c:560(ads_verify_ticket)
libads/kerberos_verify.c:559: unable to protect replay cache with mutex.
[2012/04/26 10:55:07.284978, 1] smbd/sesssetup.c:342(reply_spnego_kerberos)
Failed to verify incoming ticket with error NT_STATUS_LOGON_FAILURE!
[2012/04/26 10:55:07.285300, 3] smbd/error.c:81(error_packet_set)
error packet at smbd/sesssetup.c(344) cmd=115 (SMBsesssetupX) NT_STATUS_LOGON_FAILURE
It's not clear what the mutex.tdb file actually does or contains, all of the documentation I've found does not list what it's used for but there is clearly a problem obtaining a lock on this file that was not present on Samba v3.4.7 on the same platform. We did, however, have to patch the server in order to support the Samba package we are using. This error does not appear to be something obvious such as number of open files on the operating system that is causing this and running tdbtool against this particular file produces a similar problem obtaining a lock on the file.
The server that is experiencing this problem is sun4v architecture and has it's storage mounted via NFS from another, central file server. We are running the same samba package on the central server which is sun4u, on the same build of Solaris with the same patch cluster and don't see this error or a performance problem. This central server also has more users connected.
I have the log level set to 10 on the problematic machine currently so can supply additional log details if required. If there are any suggests on what may be causing this issue and how to resolve, that would be great.
More information about the samba