smbd process block
Sam Liddicott
sam at liddicott.com
Mon May 11 10:56:58 GMT 2009
I wonder if it is because samba 3 has a limit on the number of pending
requests and doesn't recognize that this should perhaps not apply to
change notify requests.
Anyone know?
Sam
* Goran Lowkrantz wrote, On 07/04/09 17:52:
> Hi Sam,
>
> I have tested accessing the directory with windows explorer but that
> too goes Not responding. I have touched both an old and a new file but
> that didn't change anything.
>
> here is the lsof of the unresponsive smbd:
>> lsof -p 71672
> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
> smbd 71672 root cwd VDIR 17,910622967 10 3
> /usr/data/glz/imatch
> smbd 71672 root rtd VDIR 0,104 512 2 /
> smbd 71672 root txt can't
> read vnode at 0x006c96c7
> smbd 71672 root txt can't
> read vnode at 0x00030230
> smbd 71672 root txt can't
> read vnode at 0x00020000
> smbd 71672 root txt can't
> read vnode at 0x00010000
> smbd 71672 root txt can't
> read vnode at 0x000002b8
> smbd 71672 root txt can't
> read vnode at 0x00003000
> smbd 71672 root txt can't
> read vnode at 0x00016000
> smbd 71672 root txt can't
> read vnode at 0x00009d08
> smbd 71672 root txt can't
> read vnode at 0x00007000
> smbd 71672 root txt can't
> read vnode at 0x00002000
> smbd 71672 root txt can't
> read vnode at 0x00028000
> smbd 71672 root txt can't
> read vnode at 0x00025000
> smbd 71672 root txt can't
> read vnode at 0x00043a61
> smbd 71672 root txt can't
> read vnode at 0x00011118
> smbd 71672 root txt can't
> read vnode at 0x000c035c
> smbd 71672 root txt can't
> read vnode at 0x0004ab40
> smbd 71672 root txt can't
> read vnode at 0x00193d40
> smbd 71672 root txt can't
> read vnode at 0x00014900
> smbd 71672 root txt can't
> read vnode at 0x0001c1e0
> smbd 71672 root txt can't
> read vnode at 0x00008070
> smbd 71672 root txt can't
> read vnode at 0x00008bf0
> smbd 71672 root txt can't
> read vnode at 0x0000e698
> smbd 71672 root txt can't
> read vnode at 0x001077f4
> smbd 71672 root txt can't
> read vnode at 0x00005290
> smbd 71672 root txt can't
> read vnode at 0x00008ced
> smbd 71672 root txt can't
> read vnode at 0x0011f688
> smbd 71672 root txt can't
> read vnode at 0x0001bf2b
> smbd 71672 root txt can't
> read vnode at 0x0000b7b1
> smbd 71672 root txt can't
> read vnode at 0x00014058
> smbd 71672 root txt can't
> read vnode at 0x00001dd0
> smbd 71672 root 0u VCHR 0,20 0t0 20
> /dev/null
> smbd 71672 root 1u VCHR 0,20 0t0 20
> /dev/null
> smbd 71672 root 2w VREG 129,4254531705 31595 630
> /var/log/log.tor
> smbd 71672 root 3r VCHR 0,27 0t9 27
> /dev/random
> smbd 71672 root 4u VREG 205,2015822061 12288 47036
> /usr/local -- etc/samba/secrets.tdb
> smbd 71672 root 5u VREG 203,3580952781 696 11659 /var
> (system/var)
> smbd 71672 root 6u PIPE 0xffffff010c6972e8 16384
> ->0xffffff010c697440
> smbd 71672 root 7w VREG 203,3580952781 6 11664 /var
> (system/var)
> smbd 71672 root 8u IPv4 0xffffff0065943b60 0t0 TCP
> midgard:microsoft-ds->tor:53116 (ESTABLISHED)
> smbd 71672 root 9u VREG 203,3580952781 163840 11665 /var
> (system/var)
> smbd 71672 root 10u VREG 203,3580952781 90112 11661 /var
> (system/var)
> smbd 71672 root 11u VREG 203,3580952781 40200 11666 /var
> (system/var)
> smbd 71672 root 12u VREG 203,3580952781 151552 11667 /var
> (system/var)
> smbd 71672 root 13u VREG 203,3580952781 8192 11662 /var
> (system/var)
> smbd 71672 root 14u VREG 203,3580952781 8192 1052 /var
> (system/var)
> smbd 71672 root 15u VREG 203,3580952781 8192 1013 /var
> (system/var)
> smbd 71672 root 16u VREG 203,3580952781 12288 1053 /var
> (system/var)
> smbd 71672 root 17u VREG 203,3580952781 696 1054 /var
> (system/var)
> smbd 71672 root 18u PIPE 0xffffff00096c3000 16384
> ->0xffffff00096c3158
> smbd 71672 root 20u PIPE 0xffffff010c697440 0
> ->0xffffff010c6972e8
> smbd 71672 root 21u VREG 203,3580952781 8192 1056 /var
> (system/var)
> smbd 71672 root 22u VREG 203,3580952781 696 960 /var
> (system/var)
> smbd 71672 root 23u PIPE 0xffffff010c6ff8b8 16384
> ->0xffffff010c6ffa10
> smbd 71672 root 24u PIPE 0xffffff010c6ffa10 0
> ->0xffffff010c6ff8b8
> smbd 71672 root 25w VREG 129,4254531705 31595 630
> /var/log/log.tor
> smbd 71672 root 26ur VREG 203,3580952781 28672 973 /var
> (system/var)
> smbd 71672 root 27u unix 0xffffff01982e5000 0t0
> ->0xffffff0164d145a0
>
> As you can see the process in owned by root, not the logged in user.
>
> /glz
>
> --On Tuesday, April 07, 2009 3:13 PM +0100 Sam Liddicott
> <sam at liddicott.com> wrote:
>
>> * Goran Lowkrantz wrote, On 07/04/09 14:57:
>>> Hi Sam,
>>>
>>> Each of the calls look exactly the same except the multiplex id, i.e.
>>> it's the same FID and path, which is a leaf directory containing 202
>>> files. And there is only 48 notify calls before it stops, so I can't
>>> see how that maps to the total of about 400 directories in the watched
>>> tree. just looking at leaf directories give at least 300 directories
>>> to watch.
>> My mistake, I thought you said 245 calls, but I see now that you said
>> 245 seconds.
>>>
>>> The thing I see is that the application issuing these notify requests
>>> go unresponsive after about 10 to 15 min, the frame marked Not
>>> responding. I assume this is because it is waiting for some response
>>> from the network, as killing the smbd service releases the application.
>> Try running some other application on the same share and see if the
>> share is responding - even windows explorer.
>>>
>>> And these notify setup messages are the only one I can find that do
>>> not have an SMB response.
>> This is quite normal for notify requests.
>>
>> Try adding another file to a watched folder from another windows machine
>> and see if the application comes to life when it gets the notification
>> response back.
>>
>> Sam
>>>
>>> /glz
>>>
>>> --On April 7, 2009 14:33:54 +0100 Sam Liddicott <sam at liddicott.com>
>>> wrote:
>>>
>>>> * Goran Lowkrantz wrote, On 07/04/09 14:08:
>>>>> Hi Sam,
>>>>>
>>>>> Looking further in the dump, I find that a series of these:
>>>>> NT Trans Request (0xa0)
>>>>> NT NOTIFY Setup
>>>>> Completion Filter: 0x00000003
>>>>> FID: 0x32f6 (\Input\2006\200607\20060729)
>>>>>
>>>>> but no response from the Samba server, only IP ACK. The FID look OK
>>>>> and direct queries on the FID about the directory is answered
>>>>> earlier.
>>>>>
>>>>> These messages are sent from the client every 5s for 245s. Then they
>>>>> stop.
>>>> I guess this may be once for every folder of 245 monitored folders.
>>>>
>>>> A response is not supposed to be given until the request is
>>>> cancelled or
>>>> until a monitored file or folder changes; so this seems quite normal.
>>>>
>>>> What behaviour or action (or lack of) on the windows client makes you
>>>> think that samba has hung?
>>>>
>>>> Sam
>>>>>
>>>>> /glz
>>>>>
>>>>> --On April 6, 2009 15:28:31 +0100 Sam Liddicott <sam at liddicott.com>
>>>>> wrote:
>>>>>
>>>>>> As far as I can tell, samba is not hanging, it's issuing
>>>>>> responses to
>>>>>> all the requests it gets.
>>>>>>
>>>>>> I think that the windows client thinks that samba has hung, because
>>>>>> the
>>>>>> sending of Echo Request is a sign that the client is getting fed
>>>>>> up of
>>>>>> waiting and is checking to see if the link is still working.
>>>>>>
>>>>>> So it looks like the client is waiting for some kind of response,
>>>>>> but I
>>>>>> haven't been able to spot a response that wasn't answered.
>>>>>>
>>>>>> You load the pcap into wireshark and see if you can spot a request
>>>>>> that
>>>>>> doesn't get a response...
>>>>>>
>>>>>> Sam
>>>>>>
>>>>>> * Goran Lowkrantz wrote, On 16/02/09 22:42:
>>>>>>>
>>>>>>> I have few Samba servers running FreeBSD 7.1 were we have a problem
>>>>>>> with smbd process blocking for a few Vista systems that run a
>>>>>>> program
>>>>>>> that watch directories and files on the samba shares.
>>>>>>>
>>>>>>> On my test setup I have managed to get a hang in less than 30 min.
>>>>>>>
>>>>>>> Samba 3.2.7 is built with minimum functions and full debug. Options
>>>>>>> don't seems to have any impact on the problem.
>>>>>>>
>>>>>>> The PC uses Vista Business SP1 and all patches, I run a DAM program
>>>>>>> called IMatch ver 3.6.076 that watches for changes in the photo
>>>>>>> database. No other application I have tested has the same problems.
>>>>>>> For example, Adobe Lightroom 2.2 works without problems when setup
>>>>>>> with a watched folder.
>>>>>>>
>>>>>>> I have attached logfiles from samba with the following extra
>>>>>>> settings:
>>>>>>> debug pid = yes
>>>>>>> debug timestamp = no
>>>>>>> debug prefix timestamp = yes
>>>>>>> debug uid = yes
>>>>>>> log level = 10
>>>>>>>
>>>>>>> The PID of the server that hangs is 29162.
>>>>>>>
>>>>>>> The FreeBSD server is an up-to-date quad AMD server with 8GB
>>>>>>> running
>>>>>>> 7.1-STABLE. In normal operation, I see the following:
>>>>>>>
>>>>>>> # sockstat | grep 445
>>>>>>> glz smbd 7828 23 tcp4 10.255.253.1:445
>>>>>>> 10.255.253.2:57355
>>>>>>> root smbd 76917 19 tcp4 127.0.0.1:445 *:*
>>>>>>> root smbd 76917 20 tcp4 10.255.253.1:445 *:*
>>>>>>>
>>>>>>> When I get the hang, it looks like this:
>>>>>>> # sockstat | grep 445
>>>>>>> root smbd 7828 23 tcp4 10.255.253.1:445
>>>>>>> 10.255.253.2:57355
>>>>>>> root smbd 76917 19 tcp4 127.0.0.1:445 *:*
>>>>>>> root smbd 76917 20 tcp4 10.255.253.1:445 *:*
>>>>>>>
>>>>>>> and the GDB session:
>>>>>>> # gdb /usr/local/sbin/smbd 7828
>>>>>>> GNU gdb 6.1.1 [FreeBSD]
>>>>>>> Copyright 2004 Free Software Foundation, Inc.
>>>>>>> GDB is free software, covered by the GNU General Public License,
>>>>>>> and
>>>>>>> you are
>>>>>>> welcome to change it and/or distribute copies of it under certain
>>>>>>> conditions.
>>>>>>> Type "show copying" to see the conditions.
>>>>>>> There is absolutely no warranty for GDB. Type "show warranty" for
>>>>>>> details.
>>>>>>> This GDB was configured as "amd64-marcel-freebsd"...
>>>>>>> Attaching to program: /usr/local/sbin/smbd, process 7828
>>>>>>> Reading symbols from /usr/local/lib/libldap-2.3.so.2...done.
>>>>>>> Loaded symbols for /usr/local/lib/libldap-2.3.so.2
>>>>>>> Reading symbols from /usr/local/lib/liblber-2.3.so.2...done.
>>>>>>> Loaded symbols for /usr/local/lib/liblber-2.3.so.2
>>>>>>> Reading symbols from /usr/local/lib/libcups.so.2...done.
>>>>>>> Loaded symbols for /usr/local/lib/libcups.so.2
>>>>>>> Reading symbols from /usr/lib/libssl.so.5...done.
>>>>>>> Loaded symbols for /usr/lib/libssl.so.5
>>>>>>> Reading symbols from /lib/libcrypto.so.5...done.
>>>>>>> Loaded symbols for /lib/libcrypto.so.5
>>>>>>> Reading symbols from /lib/libz.so.4...done.
>>>>>>> Loaded symbols for /lib/libz.so.4
>>>>>>> Reading symbols from /lib/libm.so.5...done.
>>>>>>> Loaded symbols for /lib/libm.so.5
>>>>>>> Reading symbols from /lib/libcrypt.so.4...done.
>>>>>>> Loaded symbols for /lib/libcrypt.so.4
>>>>>>> Reading symbols from /usr/lib/libpam.so.4...done.
>>>>>>> Loaded symbols for /usr/lib/libpam.so.4
>>>>>>> Reading symbols from /usr/local/lib/libexecinfo.so.1...done.
>>>>>>> Loaded symbols for /usr/local/lib/libexecinfo.so.1
>>>>>>> Reading symbols from /usr/local/lib/libiconv.so.3...done.
>>>>>>> Loaded symbols for /usr/local/lib/libiconv.so.3
>>>>>>> Reading symbols from /usr/local/lib/libdmalloc.so.1...done.
>>>>>>> Loaded symbols for /usr/local/lib/libdmalloc.so.1
>>>>>>> Reading symbols from /usr/local/lib/libpopt.so.0...done.
>>>>>>> Loaded symbols for /usr/local/lib/libpopt.so.0
>>>>>>> Reading symbols from /lib/libthr.so.3...done.
>>>>>>> [New Thread 0x800a62e00 (LWP 100076)]
>>>>>>> Loaded symbols for /lib/libthr.so.3
>>>>>>> Reading symbols from /lib/libc.so.7...done.
>>>>>>> Loaded symbols for /lib/libc.so.7
>>>>>>> Reading symbols from /usr/local/lib/libsasl2.so.2...done.
>>>>>>> Loaded symbols for /usr/local/lib/libsasl2.so.2
>>>>>>> Reading symbols from /usr/local/lib/libintl.so.8...done.
>>>>>>> Loaded symbols for /usr/local/lib/libintl.so.8
>>>>>>> Reading symbols from /usr/local/lib/nss_ldap.so.1...done.
>>>>>>> Loaded symbols for /usr/local/lib/nss_ldap.so.1
>>>>>>> Reading symbols from /usr/lib/libcom_err.so.4...done.
>>>>>>> Loaded symbols for /usr/lib/libcom_err.so.4
>>>>>>> Reading symbols from /libexec/ld-elf.so.1...done.
>>>>>>> Loaded symbols for /libexec/ld-elf.so.1
>>>>>>> [Switching to Thread 0x800a62e00 (LWP 100076)]
>>>>>>> 0x0000000801f01d6c in select () from /lib/libc.so.7
>>>>>>> (gdb) directory
>>>>>>> /usr/ports/net/samba32-devel/work/samba-3.2.7/source/
>>>>>>> Source directories searched:
>>>>>>> /usr/ports/net/samba32-devel/work/samba-3.2.7/source:$cdir:$cwd
>>>>>>> (gdb) bt
>>>>>>> # 0 0x0000000801f01d6c in select () from /lib/libc.so.7
>>>>>>> # 1 0x0000000801d0f4d4 in select () from /lib/libthr.so.3
>>>>>>> # 2 0x00000000006749fe in sys_select (maxfd=24,
>>>>>>> readfds=0x7fffffffd420, writefds=0x7fffffffd3a0, errorfds=0x0,
>>>>>>> tval=0x7fffffffd500)
>>>>>>> at lib/select.c:93
>>>>>>> # 3 0x00000000004df64c in smbd_process () at smbd/process.c:839
>>>>>>> # 4 0x0000000000854074 in main (argc=2, argv=0x7fffffffd638) at
>>>>>>> smbd/server.c:1450
>>>>>>> (gdb) frame 2
>>>>>>> # 2 0x00000000006749fe in sys_select (maxfd=24,
>>>>>>> readfds=0x7fffffffd420, writefds=0x7fffffffd3a0, errorfds=0x0,
>>>>>>> tval=0x7fffffffd500)
>>>>>>> at lib/select.c:93
>>>>>>> 93 ret =
>>>>>>> select(maxfd,readfds2,writefds,errorfds,tval);
>>>>>>> (gdb) print tval
>>>>>>> $1 = (struct timeval *) 0x7fffffffd500
>>>>>>> (gdb) print *tval
>>>>>>> $2 = {tv_sec = 59, tv_usec = 999977}
>>>>>>> (gdb) The program is running. Quit anyway (and detach it)? (y or
>>>>>>> n) y
>>>>>>> Detaching from program: /usr/local/sbin/smbd, process 7828
>>>>>>>
>>>>>>> The following is a truss of the process until I have seen the
>>>>>>> switch
>>>>>>> to root as owner:
>>>>>>> # time truss -p 8307
>>>>>>> gettimeofday({1234648077.989004 },0x0) = 0 (0x0)
>>>>>>> gettimeofday({1234648077.989081 },0x0) = 0 (0x0)
>>>>>>> select(24,{6 23},{},0x0,{21.288167 }) = 0 (0x0)
>>>>>>> gettimeofday({1234648099.279293 },0x0) = 0 (0x0)
>>>>>>> gettimeofday({1234648099.279370 },0x0) = 0 (0x0)
>>>>>>> gettimeofday({1234648099.279417 },0x0) = 0 (0x0)
>>>>>>> select(24,{6 23},{},0x0,{59.989982 }) = 1 (0x1)
>>>>>>> gettimeofday({1234648102.286493 },0x0) = 0 (0x0)
>>>>>>> read(23,"\0\0\0r",4) = 4 (0x4)
>>>>>>> read(23,"\M^?SMB2\0\0\0\0\^X\a\M-H\0\0\0"...,114) = 114 (0x72)
>>>>>>> geteuid(0x3e8,0x3e8,0x2,0x800adf750,0x2,0x800adf750) = 0 (0x0)
>>>>>>> getegid(0x3e8,0x3e8,0x2,0x801eadb8c,0xffffff006cf16a50,0x7fffffffd13
>>>>>>>
>>>>>>> 8)
>>>>>>>
>>>>>>> = 0 (0x0)
>>>>>>> __sysctl(0x7fffffffd0a0,0x2,0x7fffffffd0bc,0x7fffffffd0b0,0x0,0x0)
>>>>>>> = 0
>>>>>>> (0x0)
>>>>>>> 0.000u 0.001s 2:36.56 0.0% 0+0k 0+0io 0pf+0w
>>>>>>>
>>>>>>> # sockstat | grep 445
>>>>>>> glz smbd 8307 23 tcp4 10.255.253.1:445
>>>>>>> 10.255.253.2:57438
>>>>>>> root smbd 76917 19 tcp4 127.0.0.1:445 *:*
>>>>>>> root smbd 76917 20 tcp4 10.255.253.1:445 *:*
>>>>>>> # ps -awxl | grep 8307
>>>>>>> 1000 8307 8556 0 44 0 34672 7984 select IX ?? 0:04.57
>>>>>>> /usr/local/sbin/smbd -D -s /usr/local/etc/smb.conf
>>>>>>> 0 8556 3273 0 8 0 4600 1204 wait I+ p0 0:00.00
>>>>>>> truss -p 8307
>>>>>>> # sockstat | grep 445
>>>>>>> root smbd 8307 23 tcp4 10.255.253.1:445
>>>>>>> 10.255.253.2:57438
>>>>>>> root smbd 76917 19 tcp4 127.0.0.1:445 *:*
>>>>>>> root smbd 76917 20 tcp4 10.255.253.1:445 *:*
>>>>>>> # ps -awxl | grep 8307
>>>>>>> 0 8307 8556 0 44 0 34672 7984 select SX ?? 0:04.57
>>>>>>> /usr/local/sbin/smbd -D -s /usr/local/etc/smb.conf
>>>>>>> 0 8556 3273 0 8 0 4600 1204 wait I+ p0 0:00.00
>>>>>>> truss -p 8307
>>>>>>>
>>>>>>>
>>>>>>> I can recreate this at any time and the condition by killing the
>>>>>>> offending smbd process and the PC reconnects just fine.
>>>>>>>
>>>>>>> Hope this helps to pin this down. As I can recreate the hang,
>>>>>>> please
>>>>>>> let me know if there is any more information I can supply.
>>>>>>>
>>>>>>> /glz
>>>>>>>
>>>>>>> ................................................... the future
>>>>>>> isMobile
>>>>>>>
>>>>>>> Goran Lowkrantz <goran.lowkrantz at ismobile.com>
>>>>>>> System Architect, isMobile, Aurorum 2, S-977 75 Luleå, Sweden
>>>>>>> Phone: +46(0)920-75559
>>>>>>> Mobile: +46(0)70-587 87 82 Fax: +46(0)70-615 87 82
>>>>>>>
>>>>>>> http://www.ismobile.com
>>>>>>> ...............................................
>>>>>
>>>>>
>>>>>
>>>>> ................................................... the future
>>>>> isMobile
>>>>>
>>>>> Goran Lowkrantz <goran.lowkrantz at ismobile.com>
>>>>> System Architect, isMobile AB
>>>>> Sandviksgatan 81, PO Box 58, S-971 03 Luleå, Sweden
>>>>> Mobile: +46(0)70-587 87 82
>>>>> http://www.ismobile.com
>>>>> ...............................................
>>>
>>>
>>>
>>> ................................................... the future isMobile
>>>
>>> Goran Lowkrantz <goran.lowkrantz at ismobile.com>
>>> System Architect, isMobile AB
>>> Sandviksgatan 81, PO Box 58, S-971 03 Luleå, Sweden
>>> Mobile: +46(0)70-587 87 82
>>> http://www.ismobile.com ...............................................
>
>
>
> ................................................... the future isMobile
>
> Goran Lowkrantz <goran.lowkrantz at ismobile.com>
> System Architect, isMobile, Aurorum 2, S-977 75 Luleå, Sweden
> Phone: +46(0)920-75559
> Mobile: +46(0)70-587 87 82 Fax: +46(0)70-615 87 82
>
> http://www.ismobile.com ...............................................
More information about the samba-technical
mailing list