Samba 2.2.0 locking problem (long)

Tony Jago T.Jago at its.uq.edu.au
Wed May 30 01:01:50 GMT 2001


Hi all, 

<execuitive summary>

I would like some help solving some problems we are having with
the current SAMBA_2_2 cvs tree (and with 2.2.0 release). Currently we are
running samba 2.2 cvs as of about 10pm 29/5/2001 on a Solaris 8 machine.

We are seeing samba processes get stuck calling fcntl many times.

</execuitive summary>


I compiled the code with the following options:
configure --with-quotas --with-profile --with-pam --prefix=/usr/local

We run a fairly complex setup with 4 virtual sambe machines:

staff and student: basically both of these are the same. They use
encrypted passwords = no and authenticate using a pam module to either
an ldap or kerberos server. (this bit works fine).

soefile: is part of a domain called UQ. It uses security = domain to pass
authentication through to an NT PDC (this works fine).

adminfs: is part of the ADMIN domain. Authenticates against an NT server.
(this works file).

my config files are basically as follows:

smb.conf:

[global]
   workgroup = UQ
   log file = /var/log/samba/smb.%m
   log level = 1
   max log size = 5000

   netbios name = staff
   netbios aliases = student soefile adminfs

   server string = UQ Central File Server
   username map = /usr/local/etc/users.map

   hosts allow = 127. 192.168. 130.102.
   interfaces = 130.102.5.50/24
   bind interfaces only = true
   wins server = 130.102.2.112

   include = /usr/local/etc/smb.conf.%L
   include = /usr/local/etc/smbshares.conf


smb.conf.soefile:
  netbios name = soefile
  security = domain
  password server = *
  encrypt passwords = yes


smb.conf.adminfs:
  workgroup = ADMIN
  netbios name = adminfs
  security = domain
  password server = SOCRATES
  encrypt passwords = yes



This setup basically works but I should let you know about a few other
weird things we have here. Neally all of the clients to the soefile
machine are NT servers running Citrix Metaframe. This means that we are
seeing many many many different users coming from a single NT client and
therefore hitting a single smbd process. This wasn't so much of a problem
in 2.0.7 of samba except that it overflowed the maximum number of open
files. In solaris I can increase the number of file descriptors to a very
large number but descriptors after 255 can't be opened with the fopen
system call, samba needs to use the open system call. This is fine for
the most part as file access does seem to use open but there are a few
situations like opening the machine account file etc where 2.0.7 was
using fopen and this was failing. I haven't see this exact problem with
2.2.0 but I have given you all this information to help you try and under
stand our complex setup.


Problem 1: Minor problem is that I need to specify a password server for
the adminfs machine (ie. can't use password server = *). I think it may
be using the UQ domains PDC when I have * selected.

Problem 2: Major problem is that smbd processes are starting to go into
spins looking for a lock.

This is the top of a top display, as you can see there are currently 3
samba processing in a death spin consuming neally 100% of the cpu (its
a 16 way box so 100 / 16 = 6.25)

load averages: 11.00, 11.13, 10.64                                10:37:33
487 processes: 476 sleeping, 2 running, 2 zombie, 7 on cpu
CPU states: 33.2% idle, 25.1% user, 33.8% kernel,  8.0% iowait,  0.0% swap
Memory: 16G real, 6457M free, 4678M swap in use, 6935M swap free

  PID USERNAME THR PRI NICE  SIZE   RES STATE   TIME    CPU COMMAND
 9575 nsslapd   18   0    0 2492M 1646M cpu22 687:20  7.28% ns-slapd
 8751 root       1   0    0 5000K 4136K cpu24  15:51  5.33% smbd
29211 root       1   0    0 4824K 3680K cpu17  34:50  5.32% smbd
 8857 root       1   0    0 4696K 3728K cpu31  38:46  5.26% smbd
18199 root     133   0    0  490M  489M run    93.4H  4.62% squid
13890 root       1   0    0 3584K 2792K sleep 493:37  0.90% top


if I do a truss on one of these processes is produces this:

# truss -p 8751
    *** SUID: ruid/euid/suid = 0 / 12444 / 12444  ***
    *** SGID: rgid/egid/sgid = 0 / 50 / 50  ***
fcntl(13, F_SETLKW, 0xFFBEF218)                 = 0
fcntl(13, F_SETLKW, 0xFFBEF218)                 = 0
fcntl(13, F_SETLKW, 0xFFBEF218)                 = 0
fcntl(13, F_SETLKW, 0xFFBEF218)                 = 0
fcntl(13, F_SETLKW, 0xFFBEF218)                 = 0
fcntl(13, F_SETLKW, 0xFFBEF218)                 = 0
fcntl(13, F_SETLKW, 0xFFBEF218)                 = 0
fcntl(13, F_SETLKW, 0xFFBEF218)                 = 0
fcntl(13, F_SETLKW, 0xFFBEF218)                 = 0
fcntl(13, F_SETLKW, 0xFFBEF218)                 = 0
fcntl(13, F_SETLKW, 0xFFBEF218)                 = 0
fcntl(13, F_SETLKW, 0xFFBEF218)                 = 0
fcntl(13, F_SETLKW, 0xFFBEF218)                 = 0
fcntl(13, F_SETLKW, 0xFFBEF218)                 = 0
fcntl(13, F_SETLKW, 0xFFBEF218)                 = 0
fcntl(13, F_SETLKW, 0xFFBEF218)                 = 0
fcntl(13, F_SETLKW, 0xFFBEF218)                 = 0
fcntl(13, F_SETLKW, 0xFFBEF218)                 = 0
fcntl(13, F_SETLKW, 0xFFBEF218)                 = 0
fcntl(13, F_SETLKW, 0xFFBEF218)                 = 0
fcntl(13, F_SETLKW, 0xFFBEF218)                 = 0
fcntl(13, F_SETLKW, 0xFFBEF218)                 = 0
fcntl(13, F_SETLKW, 0xFFBEF218)                 = 0
fcntl(13, F_SETLKW, 0xFFBEF218)                 = 0
fcntl(13, F_SETLKW, 0xFFBEF218)                 = 0

(continues to scroll rapidly up the screen)

smbstatus is as below (dates trimmed off to fit in 80 char width):

# smbstatus | grep 8751
INFO: Debug class all level = 1   (pid 9319 from pid 9319)
uqbmosed     uqbmosed staff     8751   ps02     (130.102.5.81)
repository   uqschin  staff     8751   ps02     (130.102.5.81)
uqacurra     uqacurra staff     8751   ps02     (130.102.5.81)
repository   vajmarsh staff     8751   ps02     (130.102.5.81)
uqsway       uqsway   staff     8751   ps02     (130.102.5.81)
repository   iosdenga staff     8751   ps02     (130.102.5.81)
uqiholme     uqiholme staff     8751   ps02     (130.102.5.81)
IPC$         iosdenga staff     8751   ps02     (130.102.5.81)
repository   uqlsharm staff     8751   ps02     (130.102.5.81)
gammarsh     gammarsh staff     8751   ps02     (130.102.5.81)
mlwchris     mlwchris staff     8751   ps02     (130.102.5.81)
repository   gammarsh staff     8751   ps02     (130.102.5.81)
uqmschmi     uqmschmi staff     8751   ps02     (130.102.5.81)
uqchoney     uqchoney staff     8751   ps02     (130.102.5.81)
vajmarsh     vajmarsh staff     8751   ps02     (130.102.5.81)
repository   uqbmosed staff     8751   ps02     (130.102.5.81)
uqschin      uqschin  staff     8751   ps02     (130.102.5.81)
IPC$         nobody   nobody    8751   ps02     (130.102.5.81)
uqgbugal     uqgbugal staff     8751   ps02     (130.102.5.81)
iosdenga     iosdenga staff     8751   ps02     (130.102.5.81)
repository   mlwchris staff     8751   ps02     (130.102.5.81)

8751   DENY_ALL   WRONLY     EXCLUSIVE+BATCH  
/export/home/staff/io/iosdenga/BusinessObjects5/UserLibs/extfunct.txt 


output of pfiles:

# pfiles 8751
8751:   /usr/local/sbin/smbd
  Current rlimit: 10010 file descriptors
   0: S_IFCHR mode:0666 dev:85,10 ino:223907 uid:0 gid:3 rdev:13,2
      O_RDWR|O_LARGEFILE
   1: S_IFCHR mode:0666 dev:85,10 ino:223907 uid:0 gid:3 rdev:13,2
      O_RDWR|O_LARGEFILE
   2: S_IFCHR mode:0666 dev:85,10 ino:223907 uid:0 gid:3 rdev:13,2
      O_RDWR|O_LARGEFILE
   3: S_IFDOOR mode:0444 dev:230,0 ino:38738 uid:0 gid:0 size:0
      O_RDONLY|O_LARGEFILE FD_CLOEXEC  door to nscd[853]
   4: S_IFSOCK mode:0666 dev:225,0 ino:8245 uid:0 gid:0 size:0
      O_RDWR
        sockname: AF_INET 130.102.5.50  port: 139
        peername: AF_INET 130.102.5.81  port: 1306
   5: S_IFREG mode:0600 dev:75,29001 ino:14069 uid:0 gid:0 size:8192
      O_RDWR
   6: S_IFREG mode:0644 dev:75,29001 ino:14070 uid:0 gid:0 size:20
      O_WRONLY|O_NONBLOCK|O_LARGEFILE
      advisory write lock set by process 26646
   7: S_IFREG mode:0600 dev:75,29001 ino:14065 uid:0 gid:0 size:696
      O_RDWR
      advisory read lock set by process 26646
   8: S_IFREG mode:0644 dev:75,29001 ino:189762 uid:0 gid:0 size:565248
      O_RDWR
      advisory read lock set by process 26646
   9: S_IFREG mode:0644 dev:75,29001 ino:194155 uid:0 gid:0 size:8192
      O_RDWR
      advisory read lock set by process 13970
  10: S_IFIFO mode:0000 dev:226,0 ino:12873147 uid:0 gid:0 size:0
      O_RDWR|O_NONBLOCK
  11: S_IFIFO mode:0000 dev:226,0 ino:12873147 uid:0 gid:0 size:0
      O_RDWR|O_NONBLOCK
  12: S_IFSOCK mode:0666 dev:225,0 ino:49209 uid:0 gid:0 size:0
      O_RDWR
        sockname: AF_INET 127.0.0.1  port: 44843
  13: S_IFREG mode:0644 dev:75,29001 ino:194170 uid:0 gid:0 size:90112
      O_RDWR
      advisory read lock set by process 13970
  14: S_IFREG mode:0600 dev:75,29001 ino:194178 uid:0 gid:0 size:8192
      O_RDWR
      advisory read lock set by process 13970
  15: S_IFREG mode:0600 dev:75,29001 ino:194199 uid:0 gid:0 size:8192
      O_RDWR
      advisory read lock set by process 13970
  16: S_IFREG mode:0600 dev:75,29001 ino:194292 uid:0 gid:0 size:8192
      O_RDWR
      advisory read lock set by process 13970
  18: S_IFIFO mode:0000 dev:226,0 ino:13031393 uid:0 gid:0 size:0
      O_RDWR|O_NONBLOCK
  19: S_IFIFO mode:0000 dev:226,0 ino:13031393 uid:0 gid:0 size:0
      O_RDWR|O_NONBLOCK
  20: S_IFREG mode:0644 dev:75,29001 ino:194573 uid:0 gid:0 size:57344
      O_RDWR
      advisory read lock set by process 13970
  21: S_IFREG mode:0644 dev:75,29000 ino:825 uid:0 gid:0 size:1110905
      O_WRONLY|O_APPEND|O_LARGEFILE


# lsof -p 8751
COMMAND  PID     USER   FD   TYPE        DEVICE   SIZE/OFF     NODE NAME
smbd    8751 iosdenga  cwd   VDIR      75,43000       8192    51934 /export/home/staff/io/iosdenga
smbd    8751 iosdenga  txt   VREG      75,29001    1714140   222289 /usr/local/sbin/smbd
smbd    8751 iosdenga  txt   VREG         85,30      44892     3943 /usr/lib/nss_files.so.1
smbd    8751 iosdenga  txt   VREG      75,29001     565248   189762 /usr/local (/dev/vx/dsk/soetrindg/soetrinlocal)
smbd    8751 iosdenga  txt-r VREG      75,29001      57344   194573 /usr/local -- sessionid.tdb
smbd    8751 iosdenga  txt-r VREG      75,29001       8192   194292 /usr/local -- share_info.tdb
smbd    8751 iosdenga  txt-r VREG      75,29001       8192   194199 /usr/local -- ntdrivers.tdb
smbd    8751 iosdenga  txt-r VREG      75,29001       8192   194178 /usr/local (/dev/vx/dsk/soetrindg/soetrinlocal)
smbd    8751 iosdenga  txt-r VREG      75,29001      90112   194170 /usr/local -- locking.tdb
smbd    8751 iosdenga  txt-r VREG      75,29001       8192   194155 /usr/local -- brlock.tdb
smbd    8751 iosdenga  txt   VREG      75,29001        696    14065 /usr/local (/dev/vx/dsk/soetrindg/soetrinlocal)
smbd    8751 iosdenga  txt   VREG         85,30      17096   247619 /usr/platform/sun4u/lib/libc_psr.so.1
smbd    8751 iosdenga  txt   VREG         85,30      24968     3578 /usr/lib/libmp.so.2
smbd    8751 iosdenga  txt   VREG         85,30    1129948     3705 /usr/lib/libc.so.1
smbd    8751 iosdenga  txt   VREG         85,30      36844     3582 /usr/lib/libpam.so.1
smbd    8751 iosdenga  txt   VREG      75,29001       8192    14069 /usr/local (/dev/vx/dsk/soetrindg/soetrinlocal)
smbd    8751 iosdenga  txt   VREG         85,30     888508     3677 /usr/lib/libnsl.so.1
smbd    8751 iosdenga  txt   VREG         85,30      70260     3599 /usr/lib/libsocket.so.1
smbd    8751 iosdenga  txt   VREG         85,30      42184     3563 /usr/lib/libgen.so.1
smbd    8751 iosdenga  txt   VREG         85,30      22964     3596 /usr/lib/libsec.so.1
smbd    8751 iosdenga  txt   VREG         85,30       4624     3535 /usr/lib/libdl.so.1
smbd    8751 iosdenga  txt   VREG         85,30     195104     3594 /usr/lib/ld.so.1
smbd    8751 iosdenga    0u  VCHR          13,2        0t0   223907 /devices/pseudo/mm at 0:null
smbd    8751 iosdenga    1u  VCHR          13,2        0t0   223907 /devices/pseudo/mm at 0:null
smbd    8751 iosdenga    2u  VCHR          13,2        0t0   223907 /devices/pseudo/mm at 0:null
smbd    8751 iosdenga    3r  DOOR     0,3563983        0t0    38738 door to nscd[853]
smbd    8751 iosdenga    4u  IPv4 0x300fcfa9228 0t38776288      TCP matrix1.cc.uq.edu.au:netbios-ssn->ps02.soe.uq.edu.au:1306 (CLOSE_WAIT)
smbd    8751 iosdenga    5u  VREG      75,29001       8192    14069 /usr/local (/dev/vx/dsk/soetrindg/soetrinlocal)
smbd    8751 iosdenga    6w  VREG      75,29001         20    14070 /usr/local (/dev/vx/dsk/soetrindg/soetrinlocal)
smbd    8751 iosdenga    7u  VREG      75,29001        696    14065 /usr/local (/dev/vx/dsk/soetrindg/soetrinlocal)
smbd    8751 iosdenga    8u  VREG      75,29001     565248   189762 /usr/local (/dev/vx/dsk/soetrindg/soetrinlocal)
smbd    8751 iosdenga    9ur VREG      75,29001       8192   194155 /usr/local -- brlock.tdb
smbd    8751 iosdenga   10u  FIFO 0x3001b1c2a20        0t0 12873147 PIPE->0x3001b1c2b08
smbd    8751 iosdenga   11u  FIFO 0x3001b1c2b08        0t0 12873147 PIPE->0x3001b1c2a20
smbd    8751 iosdenga   12u  IPv4 0x3010fce0198        0t0      UDP localhost:44843 (Idle)
smbd    8751 iosdenga   13ur VREG      75,29001      90112   194170 /usr/local -- locking.tdb
smbd    8751 iosdenga   14ur VREG      75,29001       8192   194178 /usr/local (/dev/vx/dsk/soetrindg/soetrinlocal)
smbd    8751 iosdenga   15ur VREG      75,29001       8192   194199 /usr/local -- ntdrivers.tdb
smbd    8751 iosdenga   16ur VREG      75,29001       8192   194292 /usr/local -- share_info.tdb
smbd    8751 iosdenga   18u  FIFO 0x3010a071e20        0t0 13031393 PIPE->0x3010a071f08
smbd    8751 iosdenga   19u  FIFO 0x3010a071f08        0t0 13031393 PIPE->0x3010a071e20
smbd    8751 iosdenga   20ur VREG      75,29001      57344   194573 /usr/local -- sessionid.tdb
smbd    8751 iosdenga   21w  VREG      75,29000    1113425      825 /var/log/samba/smb.ps02


interesting bits from the output of smb.ps02:

[2001/05/30 10:52:33, 0] smbd/posix_acls.c:create_canon_ace_lists(702)
  create_canon_ace_lists: unable to map SID S-1-5-32-544 to uid or gid.
[2001/05/30 10:52:33, 0] smbd/posix_acls.c:create_canon_ace_lists(702)
  create_canon_ace_lists: unable to map SID S-1-5-32-544 to uid or gid.
[2001/05/30 10:52:33, 0] smbd/posix_acls.c:create_canon_ace_lists(702)
  create_canon_ace_lists: unable to map SID S-1-5-32-544 to uid or gid.
[2001/05/30 10:52:33, 0] smbd/posix_acls.c:create_canon_ace_lists(702)
  create_canon_ace_lists: unable to map SID S-1-5-32-544 to uid or gid.
[2001/05/30 10:52:33, 0] smbd/posix_acls.c:create_canon_ace_lists(702)
  create_canon_ace_lists: unable to map SID S-1-5-32-544 to uid or gid.
[2001/05/30 10:52:33, 0] smbd/posix_acls.c:create_canon_ace_lists(702)
  create_canon_ace_lists: unable to map SID S-1-5-32-544 to uid or gid.
[2001/05/30 10:52:35, 1] smbd/service.c:make_connection(633)
  ps02 (130.102.5.81) connect to service edlcowan as user edlcowan (uid=12677, gid=50) (pid 26937)
[2001/05/30 10:52:36, 1] smbd/service.c:make_connection(633)
  ps02 (130.102.5.81) connect to service repository as user edlcowan (uid=12677, gid=5


[2001/05/30 10:53:41, 0] smbd/oplock.c:request_oplock_break(995)
  request_oplock_break: no response received to oplock break request to pid 8751 on port 44843 for dev = 12ca7f8, inode = 1000248
  for dev = 12ca7f8, inode = 1000248, tv_sec = 3b143c9c, tv_usec = d4f38
[2001/05/30 10:54:13, 0] smbd/oplock.c:request_oplock_break(995)
  request_oplock_break: no response received to oplock break request to pid 8751 on port 44843 for dev = 12ca7f8, inode = 1000248
  for dev = 12ca7f8, inode = 1000248, tv_sec = 3b143c9c, tv_usec = d4f38
[2001/05/30 10:54:45, 0] smbd/oplock.c:request_oplock_break(995)
  request_oplock_break: no response received to oplock break request to pid 8751 on port 44843 for dev = 12ca7f8, inode = 1000248
  for dev = 12ca7f8, inode = 1000248, tv_sec = 3b143c9c, tv_usec = d4f38


Any assistance at all this this problem would be greatly appreciated. And
if you still looking for 2.2.1 release canidate problems I would class
this as a major one.

Thanks in advance for your help,

  Tony

---
Tony Jago, System Administrator,        E-Mail: T.Jago at its.uq.edu.au
Server and Security Group,               Phone: +61 7 33654078
Information Technology Services,           Fax: +61 7 33654065
The University of Queensland.  Brisbane, Australia. 4072.







More information about the samba-technical mailing list