[Samba] had 3 kernel panics since upgrade from 3.0.21a to 3.0.25
and 3.0.25a on CentOS 4.4
Urs Rau
urs.rau at uk.om.org
Sat Jun 16 14:23:53 GMT 2007
Does anybody have any ideas on this? On our server that has been running
'rock-solid' with no crashes we have now had 3 kernel panics that each
appear to have been triggered by the newly upgraded samba daemon.
We used to run samba 3.0.21a for 'years' with no crashes.
On May 26 we upgraded to 3.0.25
June 9 10:58:28 first crash kernel panic
process involved according to log file smbd (see below)
June 14 16:32:23 second crash kernel panic
process involved according to log file smbd (see below)
In the morning of June 15 we upgraded to 3.0.25a
June 15 17:26:36 third crash kernel panic
process involved according to log file smbd (see below)
Some specs on our server.
OS: CentOS 4.4
kernel: 2.6.9-22.0.1.EL.1smp
CPU: SMP Dual AMD Opteron(tm) Processor 246 2GHz (about 4000 bogomips)
RAM: 2GB
SWAP: 5.8GB
users: peak at ~ 50 - 60 (varies - usually or on average closer to 30 or so)
Here are the log files of the kernel panics. Is this a kernel bug
triggered by a samba daemon, or a samba daemon bug that crashed the kernel?
******************** first crash ************************
Jun 9 10:58:03 uk smbd[21513]: [2007/06/09 10:58:03, 0]
smbd/service.c:make_connection_snum(928)
Jun 9 10:58:03 uk smbd[21513]: Can't become connected user!
Jun 9 10:58:05 10.37.2.139 SecurityCenter: N/A: The Security Center
service has been stopped. It was prevented from running by a software
group policy.
Jun 9 10:58:10 10.37.2.139 W32Time: N/A: Time Provider NtpClient: This
machine is configured to use the domain hierarchy to determine its time
source, but the computer is joined to a Windows NT 4.0 domain. Windows
NT 4.0 domain controllers do not have a time service and do not support
domain hierarchy as a time source. NtpClient will attempt to use an
alternate configured external time source if available. If an external
time source is not configured or used for this computer, you may choose
to disable the NtpClient.
Jun 9 10:58:10 10.37.2.139 W32Time: N/A: The time provider NtpClient is
configured to acquire time from one or more time sources, however none
of the sources are accessible. NtpClient has no source of accurate time.
Jun 9 10:58:20 10.37.2.139 E100B: N/A: Intel(R) PRO/100 VM Network
Connection driver has been started
Jun 9 10:58:28 uk kernel: ------------[ cut here ]------------
Jun 9 10:58:28 uk kernel: kernel BUG at mm/prio_tree.c:528!
Jun 9 10:58:28 uk kernel: invalid operand: 0000 [#1]
Jun 9 10:58:28 uk kernel: SMP
Jun 9 10:58:28 uk kernel: Modules linked in: nls_utf8 usb_storage vfat
fat md5 ipv6 parport_pc lp parport tun sunrpc ipt_MASQUERADE ipt_TOS
ipt_LOG iptable_filter iptable_mangle iptable_nat ip_conntrack ip_tables
button battery ac ohci_hcd e1000 tg3 floppy st ext3 jbd dm_mod gdth
aic79xx sata_sil libata sd_mod scsi_mod
Jun 9 10:58:28 uk kernel: CPU: 0
Jun 9 10:58:28 uk kernel: EIP: 0060:[<c01450fd>] Not tainted VLI
Jun 9 10:58:28 uk kernel: EFLAGS: 00010216 (2.6.9-22.0.1.EL.1omsmp)
Jun 9 10:58:28 uk kernel: EIP is at vma_prio_tree_add+0x36/0x95
Jun 9 10:58:28 uk kernel: eax: 00000009 ebx: e721c17c ecx: 00000000
edx: 000000b3
Jun 9 10:58:28 uk kernel: esi: f47ab85c edi: c293ba88 ebp: d0f28250
esp: db80ef3c
Jun 9 10:58:28 uk kernel: ds: 007b es: 007b ss: 0068
Jun 9 10:58:28 uk kernel: Process smbd (pid: 21513, threadinfo=db80e000
task=c269eef0)
Jun 9 10:58:28 uk kernel: Stack: e721c17c f76b4c40 c014e1ee e721c17c
000000fb 00000000 eaae6640 c014ed2e
Jun 9 10:58:28 uk kernel: d0f28250 d0f28248 00000000 00000001
00000000 c293b9d8 f76b4c40 000b4000
Jun 9 10:58:28 uk kernel: b74e8000 d0f2822c d0f28250 d0f28248
f76b4c40 f76b4c70 db80e000 eaae6640
Jun 9 10:58:28 uk kernel: Call Trace:
Jun 9 10:58:28 uk kernel: [<c014e1ee>] vma_link+0x9c/0xbc
Jun 9 10:58:28 uk kernel: [<c014ed2e>] do_mmap_pgoff+0x50e/0x666
Jun 9 10:58:28 uk kernel: [<c010b697>] sys_mmap2+0x7e/0xaf
Jun 9 10:58:28 uk kernel: [<c02d1213>] syscall_call+0x7/0xb
Jun 9 10:58:28 uk kernel: Code: c3 39 ca 74 08 0f 0b 0f 02 64 4e 2e c0
8b 43 08 2b 43 04 c1 e8 0c 8d 54 02 ff 8b 46 08 2b 46 04 c1 e8 0c 8d 44
01 ff 39 c2 74 08 <0f> 0b 10 02 64 4e 2e c0 c7 43 34 00 00 00 00 83 7e
34 00 c7 43
Jun 9 10:58:28 uk kernel: <0>Fatal exception: panic in 5 seconds
Jun 9 13:04:57 uk syslogd 1.4.1: restart (remote reception).
Jun 9 13:04:57 uk syslog: syslogd startup succeeded
Jun 9 13:04:57 uk kernel: klogd 1.4.1, log source = /proc/kmsg started.
******************** second crash ************************
Jun 14 16:25:02 uk nmbd[14947]: [2007/06/14 16:25:02, 0]
libsmb/nmblib.c:send_udp(791)
Jun 14 16:25:02 uk nmbd[14947]: Packet send failed to 10.37.2.70(138)
ERRNO=Operation not permitted
Jun 14 16:25:16 uk crond(pam_unix)[15907]: session closed for user root
Jun 14 16:25:33 uk clamd[10155]: SelfCheck: Database status OK.
Jun 14 16:26:44 uk -- MARK --
Jun 14 16:27:44 uk -- MARK --
Jun 14 16:28:01 uk crond(pam_unix)[16009]: session opened for user root
by (uid=0)
Jun 14 16:28:01 uk crond[16010]: (root) CMD (ping -c 1 uucp.cid.net >
/dev/null 2>&1;sleep 8;/usr/sbin/uucico -S mailhost)
Jun 14 16:28:44 uk -- MARK --
Jun 14 16:28:45 uk nmbd[14947]: [2007/06/14 16:28:45, 0]
libsmb/nmblib.c:send_udp(791)
Jun 14 16:28:45 uk nmbd[14947]: Packet send failed to 10.37.2.35(138)
ERRNO=Operation not permitted
Jun 14 16:29:44 uk -- MARK --
Jun 14 16:30:01 uk crond(pam_unix)[16031]: session opened for user root
by (uid=0)
Jun 14 16:30:01 uk crond[16032]: (root) CMD (/usr/lib/sa/sa1 1 1)
Jun 14 16:30:01 uk crond(pam_unix)[16033]: session opened for user root
by (uid=0)
Jun 14 16:30:01 uk crond[16035]: (root) CMD (/opt/sarcheck/bin/prst1)
Jun 14 16:30:01 uk crond(pam_unix)[16034]: session opened for user root
by (uid=0)
Jun 14 16:30:01 uk crond[16037]: (root) CMD (ping -c 1 uucp.cid.net >
/dev/null 2>&1;sleep 8;/usr/sbin/uucico -S mailhost)
Jun 14 16:30:01 uk crond(pam_unix)[16031]: session closed for user root
Jun 14 16:30:02 uk crond(pam_unix)[16033]: session closed for user root
Jun 14 16:30:10 uk crond(pam_unix)[16034]: session closed for user root
Jun 14 16:30:44 uk -- MARK --
Jun 14 16:31:32 uk crond(pam_unix)[16009]: session closed for user root
Jun 14 16:32:23 uk kernel: ------------[ cut here ]------------
Jun 14 16:32:23 uk kernel: kernel BUG at mm/prio_tree.c:528!
Jun 14 16:32:23 uk kernel: invalid operand: 0000 [#1]
Jun 14 16:32:23 uk kernel: SMP
Jun 14 16:32:23 uk kernel: Modules linked in: vfat fat md5 ipv6
parport_pc lp parport tun sunrpc ipt_MASQUERADE ipt_TOS ipt_LOG
iptable_filter iptable_mangle iptable_nat ip_conntrack ip_tables
usb_storage button battery ac ohci_hcd e1000 tg3 floppy st ext3 jbd
dm_mod gdth aic79xx sata_sil libata sd_mod scsi_mod
Jun 14 16:32:23 uk kernel: CPU: 0
Jun 14 16:32:23 uk kernel: EIP: 0060:[<c01450fd>] Not tainted VLI
Jun 14 16:32:23 uk kernel: EFLAGS: 00010212 (2.6.9-22.0.1.EL.1omsmp)
Jun 14 16:32:23 uk kernel: EIP is at vma_prio_tree_add+0x36/0x95
Jun 14 16:32:23 uk kernel: eax: 00000009 ebx: c8a05804 ecx: 00000000
edx: 00000041
Jun 14 16:32:23 uk kernel: esi: f76587ac edi: ec80e450 ebp: e3a1b358
esp: cb136f3c
Jun 14 16:32:23 uk kernel: ds: 007b es: 007b ss: 0068
Jun 14 16:32:23 uk kernel: Process smbd (pid: 17852, threadinfo=cb136000
task=d22a85b0)
Jun 14 16:32:23 uk kernel: Stack: c8a05804 e9ae8300 c014e1ee c8a05804
000000fb 00000000 caa99480 c014ed2e
Jun 14 16:32:23 uk kernel: e3a1b358 e3a1b350 00000000 00000001
00000000 ec80e3a0 e9ae8300 00042000
Jun 14 16:32:23 uk kernel: b7867000 e3a1b334 e3a1b358 e3a1b350
e9ae8300 e9ae8330 cb136000 caa99480
Jun 14 16:32:23 uk kernel: Call Trace:
Jun 14 16:32:23 uk kernel: [<c014e1ee>] vma_link+0x9c/0xbc
Jun 14 16:32:23 uk kernel: [<c014ed2e>] do_mmap_pgoff+0x50e/0x666
Jun 14 16:32:23 uk kernel: [<c010b697>] sys_mmap2+0x7e/0xaf
Jun 14 16:32:23 uk kernel: [<c02d1213>] syscall_call+0x7/0xb
Jun 14 16:32:23 uk kernel: Code: c3 39 ca 74 08 0f 0b 0f 02 64 4e 2e c0
8b 43 08 2b 43 04 c1 e8 0c 8d 54 02 ff 8b 46 08 2b 46 04 c1 e8 0c 8d 44
01 ff 39 c2 74 08 <0f> 0b 10 02 64 4e 2e c0 c7 43 34 00 00 00 00 83 7e
34 00 c7 43
Jun 14 16:32:23 uk kernel: <0>Fatal exception: panic in 5 seconds
Jun 14 17:11:46 uk syslogd 1.4.1: restart (remote reception).
Jun 14 17:11:46 uk syslog: syslogd startup succeeded
Jun 14 17:11:46 uk kernel: klogd 1.4.1, log source = /proc/kmsg started.
******************** third crash ************************
Jun 15 17:26:36 uk kernel: ------------[ cut here ]------------
Jun 15 17:26:36 uk kernel: kernel BUG at mm/prio_tree.c:528!
Jun 15 17:26:36 uk kernel: invalid operand: 0000 [#1]
Jun 15 17:26:36 uk kernel: SMP
Jun 15 17:26:36 uk kernel: Modules linked in: vfat fat usb_storage md5
ipv6 parport_pc lp parport tun sunrpc ipt_MASQUERADE ipt_TOS ipt_LOG
iptable_filter iptable_mangle iptable_nat ip_conntrack ip_tables button
battery ac ohci_hcd e1000 tg3 floppy st ext3 jbd dm_mod gdth aic79xx
sata_sil libata sd_mod scsi_mod
Jun 15 17:26:36 uk kernel: CPU: 0
Jun 15 17:26:36 uk kernel: EIP: 0060:[<c01450fd>] Not tainted VLI
Jun 15 17:26:36 uk kernel: EFLAGS: 00010216 (2.6.9-22.0.1.EL.1omsmp)
Jun 15 17:26:36 uk kernel: EIP is at vma_prio_tree_add+0x36/0x95
Jun 15 17:26:36 uk kernel: eax: 00000009 ebx: f2cf2754 ecx: 00000000
edx: 00000031
Jun 15 17:26:36 uk kernel: esi: f649b124 edi: f6616cb0 ebp: daf1b3b0
esp: c4093f3c
Jun 15 17:26:36 uk kernel: ds: 007b es: 007b ss: 0068
Jun 15 17:26:36 uk kernel: Process smbd (pid: 12530, threadinfo=c4093000
task=f72bf1f0)
Jun 15 17:26:36 uk kernel: Stack: f2cf2754 f07e2600 c014e1ee f2cf2754
000000fb 00000000 f374ab00 c014ed2e
Jun 15 17:26:36 uk kernel: daf1b3b0 daf1b3a8 00000000 00000001
00000000 f6616c00 f07e2600 00032000
Jun 15 17:26:36 uk kernel: b7bf6000 daf1b38c daf1b3b0 daf1b3a8
f07e2600 f07e2630 c4093000 f374ab00
Jun 15 17:26:36 uk kernel: Call Trace:
Jun 15 17:26:36 uk kernel: [<c014e1ee>] vma_link+0x9c/0xbc
Jun 15 17:26:36 uk kernel: [<c014ed2e>] do_mmap_pgoff+0x50e/0x666
Jun 15 17:26:36 uk kernel: [<c010b697>] sys_mmap2+0x7e/0xaf
Jun 15 17:26:36 uk kernel: [<c02d1213>] syscall_call+0x7/0xb
Jun 15 17:26:36 uk kernel: Code: c3 39 ca 74 08 0f 0b 0f 02 64 4e 2e c0
8b 43 08 2b 43 04 c1 e8 0c 8d 54 02 ff 8b 46 08 2b 46 04 c1 e8 0c 8d 44
01 ff 39 c2 74 08 <0f> 0b 10 02 64 4e 2e c0 c7 43 34 00 00 00 00 83 7e
34 00 c7 43
Jun 15 17:26:36 uk kernel: <0>Fatal exception: panic in 5 seconds
Am I reading this right? The Process involved on each of these kernel
panics is "Process smbd"?
Jun 9 10:58:28 uk kernel: Process smbd (pid: 21513, threadinfo=db80e000
task=c269eef0)
Jun 14 16:32:23 uk kernel: Process smbd (pid: 17852, threadinfo=cb136000
task=d22a85b0)
Jun 15 17:26:36 uk kernel: Process smbd (pid: 12530, threadinfo=c4093000
task=f72bf1f0)
I am sorry if I point the finger at the wrong thing here. But it seems
strange that a server starts kernel panicking in this 'consistent' way
always showing the same process 'smbd' involved and combined with the
fact that the samba rpm upgrade is the only thing that recently changed
on this server.
Or is the fault really a kernel bug as the log file entry suggests with
"kernel BUG at mm/prio_tree.c:528!"
Jun 9 10:58:28 uk kernel: kernel BUG at mm/prio_tree.c:528!
Jun 9 10:58:28 uk kernel: invalid operand: 0000 [#1]
Jun 9 10:58:28 uk kernel: SMP
Jun 14 16:32:23 uk kernel: kernel BUG at mm/prio_tree.c:528!
Jun 14 16:32:23 uk kernel: invalid operand: 0000 [#1]
Jun 14 16:32:23 uk kernel: SMP
Jun 15 17:26:36 uk kernel: kernel BUG at mm/prio_tree.c:528!
Jun 15 17:26:36 uk kernel: invalid operand: 0000 [#1]
Jun 15 17:26:36 uk kernel: SMP
Any clever ideas? I will explore the redhat kernel list and see if there
is a newer one maybe one from CentOS 4.5?
Google gives me a number of hits dating back many months where the
kernel BUG "kernel BUG at mm/prio_tree.c:528!" has been triggered with a
variety of processes (some smbds - but also a few others)
Many thanks for any pointers. Would be really great if I could tell
people Monday morning when they come back to work, that we have found
the culprit, or better that we have managed to fix it even. There is to
hopeing.
Regards,
--
Urs Rau
More information about the samba
mailing list