[Samba] Samba freezes server when writing 50MB MS Word document
(sometimes)
Sebastian Scholz
gsscholz at gmail.com
Sat Jul 11 01:14:21 MDT 2009
Hi list
for a couple of days we have the problem that our samba pdc server (ubuntu
8.04
LTS [kernel 2.6.24-24] with 3.0.28a samba package) freezes sometimes when
someone saves a specific word document (about 50MB) from MS Word. It's not
always the same person. Saving the document - as nearly any huge MS Office
document - takes some time, about one minute (guessed).
First the server gets really slow. Some user programs on clients fail to
write.
The CPU usage of the smb process of the user writing the document goes up to
nearly 100%. After a while it's not possible to log on to the server - even
locally. It asks for a password but there won't be a prompt. Sometimes there
is
nothing on the console/screen. Hitting a button does not change a thing.
It's
not possible to kill the process. Another thing is that there are a couple
of
smbd processes for one user (see attached user smbd log).
When rebooting the server it does not shut down properly. I guess the
process
blocks the harddrive thus it cannot be unmounted. A ALT+Print+u and
ALT+Print+b
helps.
###########################################################################
This is what the syslog says:
Jul 3 15:01:44 [..] BUG: soft lockup - CPU#1 stuck for 11s! [smbd:23017]
Jul 3 15:01:44 [..]
Jul 3 15:01:44 [..] Pid: 23017, comm: smbd Not tainted (2.6.24-24-server
#1)
Jul 3 15:01:44 [..] EIP: 0060:[<f98c1e0c>] EFLAGS: 00000203 CPU: 1
Jul 3 15:01:44 [..] EIP is at xfs_iext_bno_to_ext+0x8c/0x1f0 [xfs]
Jul 3 15:01:44 [..] EAX: 000000ba EBX: 00002404 ECX: 000000ba EDX: 000000b5
Jul 3 15:01:44 [..] ESI: 00000000 EDI: 000051c7 EBP: 00000001 ESP: ee5298a4
Jul 3 15:01:44 [..] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Jul 3 15:01:44 [..] CR0: 8005003b CR2: b7f2f000 CR3: 3577d000 CR4: 000006b0
Jul 3 15:01:44 [..] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Jul 3 15:01:44 [..] DR6: ffff0ff0 DR7: 00000400
Jul 3 15:01:44 [..] [<f989af59>] xfs_bmap_search_multi_extents+0x69/0x100
[xfs]
Jul 3 15:01:44 [..] [<f989b05f>] xfs_bmap_search_extents+0x6f/0x150 [xfs]
Jul 3 15:01:44 [..] [apic_timer_interrupt+0x28/0x30]
apic_timer_interrupt+0x28/0x30
Jul 3 15:01:44 [..] [<f98a4714>] xfs_bmapi+0x1e4/0x16a0 [xfs]
Jul 3 15:01:44 [..] [net_rx_action+0x12d/0x210] net_rx_action+0x12d/0x210
Jul 3 15:01:44 [..] [run_timer_softirq+0x30/0x1e0]
run_timer_softirq+0x30/0x1e0
Jul 3 15:01:44 [..] [tick_program_event+0x38/0x60]
tick_program_event+0x38/0x60
Jul 3 15:01:44 [..] [__do_softirq+0x82/0x110] __do_softirq+0x82/0x110
Jul 3 15:01:44 [..] [<f98c83c9>]
xfs_iomap_eof_want_preallocate+0x149/0x220 [xfs]
Jul 3 15:01:44 [..] [<f98c8bd3>] xfs_iomap_write_delay+0x153/0x3e0 [xfs]
Jul 3 15:01:44 [..] [<f98c7fe7>] xfs_iomap+0x277/0x510 [xfs]
Jul 3 15:01:44 [..] [<f98c1eda>] xfs_iext_bno_to_ext+0x15a/0x1f0 [xfs]
Jul 3 15:01:44 [..] [<f98c8274>] xfs_iomap+0x504/0x510 [xfs]
Jul 3 15:01:44 [..] [<f98e8393>] __xfs_get_blocks+0x73/0x2b0 [xfs]
Jul 3 15:01:44 [..] [<f98e8628>] xfs_get_blocks+0x28/0x30 [xfs]
Jul 3 15:01:44 [..] [__block_prepare_write+0x228/0x400]
__block_prepare_write+0x228/0x400
Jul 3 15:01:44 [..] [sg:__alloc_pages+0x5a/0x380] __alloc_pages+0x5a/0x380
Jul 3 15:01:44 [..] [ext3:__grab_cache_page+0x97/0x1a90]
__grab_cache_page+0x97/0xc0
Jul 3 15:01:44 [..] [ext3:block_write_begin+0x4d/0x590]
block_write_begin+0x4d/0xf0
Jul 3 15:01:44 [..] [<f98e8600>] xfs_get_blocks+0x0/0x30 [xfs]
Jul 3 15:01:44 [..] [<f98e7da6>] xfs_vm_write_begin+0x46/0x50 [xfs]
Jul 3 15:01:44 [..] [<f98e8600>] xfs_get_blocks+0x0/0x30 [xfs]
Jul 3 15:01:44 [..] [xfs:generic_file_buffered_write+0x117/0xcf0]
generic_file_buffered_write+0x117/0x630
Jul 3 15:01:44 [..] [<f98f0880>] xfs_write+0x830/0x910 [xfs]
Jul 3 15:01:44 [..] [<f98ec6cc>] xfs_file_aio_write+0x5c/0x70 [xfs]
Jul 3 15:01:44 [..] [xfs:do_sync_write+0xd5/0x120]
do_sync_write+0xd5/0x120
Jul 3 15:01:44 [..] [<c0145fa0>] autoremove_wake_function+0x0/0x40
Jul 3 15:01:44 [..] [<c0145fa0>] autoremove_wake_function+0x0/0x40
Jul 3 15:01:44 [..] [set_current_groups+0x25e/0x270]
set_current_groups+0x25e/0x270
Jul 3 15:01:44 [..] [<c019a2c0>] do_sync_write+0x0/0x120
Jul 3 15:01:44 [..] [vfs_write+0xb9/0x170] vfs_write+0xb9/0x170
Jul 3 15:01:44 [..] [__capable+0x8/0x20] __capable+0x8/0x20
Jul 3 15:01:44 [..] [sys_pwrite64+0x66/0x80] sys_pwrite64+0x66/0x80
Jul 3 15:01:44 [..] [sysenter_past_esp+0x6b/0xa1]
sysenter_past_esp+0x6b/0xa1
Jul 3 15:01:44 [..] =======================
[.. this repeats x times ...]
[.. REBOOT / relevant boot messages ..]
Jul 3 15:33:04 [..] [ 115.676800] Driver 'sd' needs updating - please use
bus_type methods
Jul 3 15:33:04 [..] [ 115.676853] sd 0:0:0:0: [sda] 2927603712 512-byte
hardware sectors (1498933 MB)
Jul 3 15:33:04 [..] [ 115.676863] sd 0:0:0:0: [sda] Write Protect is off
Jul 3 15:33:04 [..] [ 115.676864] sd 0:0:0:0: [sda] Mode Sense: 06 00 10
00
Jul 3 15:33:04 [..] [ 115.676879] sd 0:0:0:0: [sda] Write cache: enabled,
read cache: enabled, supports DPO and FUA
Jul 3 15:33:04 [..] [ 115.676921] sd 0:0:0:0: [sda] 2927603712 512-byte
hardware sectors (1498933 MB)
Jul 3 15:33:04 [..] [ 115.676929] sd 0:0:0:0: [sda] Write Protect is off
Jul 3 15:33:04 [..] [ 115.676931] sd 0:0:0:0: [sda] Mode Sense: 06 00 10
00
Jul 3 15:33:04 [..] [ 115.676945] sd 0:0:0:0: [sda] Write cache: enabled,
read cache: enabled, supports DPO and FUA
Jul 3 15:33:04 [..] [ 115.676947] sda: sda1
[.. snipp ..]
Jul 3 15:33:04 [..] [ 129.848176] SGI XFS with ACLs, security attributes,
realtime, large block numbers, no debug enabled
Jul 3 15:33:04 [..] [ 129.848501] SGI XFS Quota Management subsystem
Jul 3 15:33:04 [..] [ 129.875533] XFS mounting filesystem sda1
Jul 3 15:33:04 [..] [ 130.036456] Starting XFS recovery on filesystem:
sda1 (logdev: internal)
Jul 3 15:33:04 [..] [ 131.699948] Ending XFS recovery on filesystem: sda1
(logdev: internal)
[.. same error again ..]
Jul 3 16:12:25 [..] [ 2488.349286] Filesystem "sda1": Access to block zero
in inode 1073770465 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0
lastx: 1cf
Jul 3 16:12:25 [..] [ 2488.351994] Filesystem "sda1": Access to block zero
in inode 1073770465 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0
lastx: 1cf
Jul 3 16:12:25 [..] [ 2488.354591] Filesystem "sda1": Access to block zero
in inode 1073770465 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0
lastx: 1cf
Jul 3 16:12:25 [..] [ 2488.357180] Filesystem "sda1": Access to block zero
in inode 1073770465 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0
lastx: 1cf
Jul 3 16:12:35 [..] [ 2499.059013] BUG: soft lockup - CPU#0 stuck for 11s!
[smbd:7280]
Jul 3 16:12:35 [..] [ 2499.060252]
Jul 3 16:12:35 [..] [ 2499.060254] Pid: 7280, comm: smbd Not tainted
(2.6.24-24-server #1)
Jul 3 16:12:35 [..] [ 2499.060256] EIP: 0060:[<f98aff07>] EFLAGS: 00000282
CPU: 0
Jul 3 16:12:35 [..] [ 2499.060275] EIP is at
xfs_bmap_search_multi_extents+0x17/0x100 [xfs]
Jul 3 16:12:35 [..] [ 2499.060276] EAX: ef34c038 EBX: ef34c038 ECX:
00000000 EDX: 00001a93
Jul 3 16:12:35 [..] [ 2499.060278] ESI: 00001a93 EDI: f4565d44 EBP:
f4565d28 ESP: f4565c04
Jul 3 16:12:35 [..] [ 2499.060279] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS:
0068
Jul 3 16:12:35 [..] [ 2499.060281] CR0: 8005003b CR2: b792fa84 CR3:
351b2000 CR4: 000006f0
Jul 3 16:12:35 [..] [ 2499.060282] DR0: 00000000 DR1: 00000000 DR2:
00000000 DR3: 00000000
Jul 3 16:12:35 [..] [ 2499.060283] DR6: ffff0ff0 DR7: 00000400
Jul 3 16:12:35 [..] [ 2499.060293] [<f98b005f>]
xfs_bmap_search_extents+0x6f/0x150 [xfs]
Jul 3 16:12:35 [..] [ 2499.060315] [apic_timer_interrupt+0x28/0x30]
apic_timer_interrupt+0x28/0x30
Jul 3 16:12:35 [..] [ 2499.060322] [<f98b9714>] xfs_bmapi+0x1e4/0x16a0
[xfs]
Jul 3 16:12:35 [..] [ 2499.060346] [run_timer_softirq+0x30/0x1e0]
run_timer_softirq+0x30/0x1e0
Jul 3 16:12:35 [..] [ 2499.060352] [__do_softirq+0x82/0x110]
__do_softirq+0x82/0x110
Jul 3 16:12:35 [..] [ 2499.060359] [<f98b00d8>]
xfs_bmap_search_extents+0xe8/0x150 [xfs]
Jul 3 16:12:35 [..] [ 2499.060379] [<f99048e5>] xfs_zero_eof+0x185/0x300
[xfs]
Jul 3 16:12:35 [..] [ 2499.060405] [<f9905380>] xfs_write+0x330/0x910
[xfs]
Jul 3 16:12:35 [..] [ 2499.060429] [skb_release_all+0x54/0x130]
skb_release_all+0x54/0x130
Jul 3 16:12:35 [..] [ 2499.060435] [<f99016cc>]
xfs_file_aio_write+0x5c/0x70 [xfs]
Jul 3 16:12:35 [..] [ 2499.060456] [xfs:do_sync_write+0xd5/0x120]
do_sync_write+0xd5/0x120
Jul 3 16:12:35 [..] [ 2499.060462] [<c0145fa0>]
autoremove_wake_function+0x0/0x40
Jul 3 16:12:35 [..] [ 2499.060465] [<c0145fa0>]
autoremove_wake_function+0x0/0x40
Jul 3 16:12:35 [..] [ 2499.060471] [<c019a2c0>] do_sync_write+0x0/0x120
Jul 3 16:12:35 [..] [ 2499.060473] [vfs_write+0xb9/0x170]
vfs_write+0xb9/0x170
Jul 3 16:12:35 [..] [ 2499.060477] [sys_pwrite64+0x66/0x80]
sys_pwrite64+0x66/0x80
Jul 3 16:12:35 [..] [ 2499.060479] [sysenter_past_esp+0x6b/0xa1]
sysenter_past_esp+0x6b/0xa1
Jul 3 16:12:35 [..] [ 2499.060483] [rt_mutex_slowlock+0x460/0x4b0]
rt_mutex_slowlock+0x460/0x4b0
Jul 3 16:12:35 [..] [ 2499.060488] =======================
Jul 3 16:12:36 [..] [ 2499.142186] named[5015]: segfault at 005047ff eip
b7fd1410 esp 005047ff error 4
Jul 3 16:12:47 [..] [ 2510.849153] BUG: soft lockup - CPU#0 stuck for 11s!
[smbd:7280]
[..and so on..]
###########################################################################
The samba log of the user with the smbd process with pid 23017 says (the
main smbd process log does not say anything):
[2009/07/02 20:51:03, 1] smbd/service.c:make_connection_snum(1033)
pc_tom (192.168.43.189) connect to service profiles initially as user tomh
(uid=2048, gid=1901) (pid 23017)
[2009/07/02 20:51:06, 1] smbd/service.c:make_connection_snum(1033)
pc_tom (192.168.43.189) connect to service profiles initially as user tomh
(uid=2048, gid=1901) (pid 23017)
[2009/07/02 20:51:06, 1] smbd/service.c:make_connection_snum(1033)
pc_tom (192.168.43.189) connect to service profiles initially as user tomh
(uid=2048, gid=1901) (pid 23017)
[2009/07/02 20:51:06, 1] smbd/service.c:make_connection_snum(1033)
pc_tom (192.168.43.189) connect to service profiles initially as user tomh
(uid=2048, gid=1901) (pid 23017)
[2009/07/02 20:51:06, 1] smbd/service.c:make_connection_snum(1033)
pc_tom (192.168.43.189) connect to service profiles initially as user tomh
(uid=2048, gid=1901) (pid 23017)
[2009/07/02 21:01:33, 1] smbd/service.c:close_cnum(1230)
pc_tom (192.168.43.189) closed connection to service profiles
[2009/07/02 21:01:33, 1] smbd/service.c:close_cnum(1230)
pc_tom (192.168.43.189) closed connection to service profiles
[2009/07/02 21:01:33, 1] smbd/service.c:close_cnum(1230)
pc_tom (192.168.43.189) closed connection to service profiles
[2009/07/02 21:01:33, 1] smbd/service.c:close_cnum(1230)
pc_tom (192.168.43.189) closed connection to service profiles
[2009/07/02 21:01:33, 1] smbd/service.c:close_cnum(1230)
pc_tom (192.168.43.189) closed connection to service profiles
[2009/07/03 15:02:44, 0] lib/util_sock.c:write_data(562)
write_data: write failure in writing to client 0.0.0.0. Error Connection
reset by peer
[2009/07/03 15:02:44, 0] lib/util_sock.c:send_smb(761)
Error writing 4 bytes to client. -1. (Connection reset by peer)
[2009/07/03 15:02:44, 1] smbd/service.c:make_connection_snum(1033)
pc_tom (192.168.43.189) connect to service tomh initially as user tomh
(uid=2048, gid=1901) (pid 11831)
[2009/07/03 15:02:44, 1] smbd/service.c:make_connection_snum(1033)
pc_tom (192.168.43.189) connect to service data initially as user tomh
(uid=0, gid=1901) (pid 11831)
[2009/07/03 15:02:45, 1] smbd/service.c:make_connection_snum(1033)
pc_tom (192.168.43.189) connect to service projekte initially as user tomh
(uid=0, gid=1901) (pid 11831)
[2009/07/03 15:03:56, 1] smbd/service.c:make_connection_snum(1033)
pc_tom (192.168.43.189) connect to service exchange initially as user tomh
(uid=0, gid=1901) (pid 11831)
[2009/07/03 15:03:56, 1] smbd/service.c:make_connection_snum(1033)
pc_tom (192.168.43.189) connect to service orga initially as user tomh
(uid=0, gid=1901) (pid 11831)
[2009/07/03 15:03:56, 1] smbd/service.c:make_connection_snum(1033)
pc_tom (192.168.43.189) connect to service info initially as user tomh
(uid=0, gid=1901) (pid 11831)
[2009/07/03 15:03:56, 1] smbd/service.c:make_connection_snum(1033)
pc_tom (192.168.43.189) connect to service archiv initially as user tomh
(uid=0, gid=1901) (pid 11831)
[2009/07/03 15:03:56, 1] smbd/service.c:make_connection_snum(1033)
pc_tom (192.168.43.189) connect to service setup initially as user tomh
(uid=2048, gid=1901) (pid 11831)
[2009/07/03 15:05:34, 0] lib/util_sock.c:write_data(562)
write_data: write failure in writing to client 192.168.43.189. Error
Connection reset by peer
[2009/07/03 15:05:34, 0] lib/util_sock.c:send_smb(761)
Error writing 4 bytes to client. -1. (Connection reset by peer)
[2009/07/03 15:05:34, 1] smbd/service.c:make_connection_snum(1033)
pc_tom (192.168.43.189) connect to service tomh initially as user tomh
(uid=2048, gid=1901) (pid 11895)
[2009/07/03 15:05:38, 1] smbd/service.c:make_connection_snum(1033)
pc_tom (192.168.43.189) connect to service orga initially as user tomh
(uid=0, gid=1901) (pid 11895)
[2009/07/03 15:05:47, 1] smbd/service.c:make_connection_snum(1033)
pc_tom (192.168.43.189) connect to service projekte initially as user tomh
(uid=0, gid=1901) (pid 11895)
[2009/07/03 15:07:17, 1] smbd/service.c:make_connection_snum(1033)
pc_tom (192.168.43.189) connect to service projekte initially as user tomh
(uid=0, gid=1901) (pid 11920)
[2009/07/03 15:07:17, 1] smbd/service.c:make_connection_snum(1033)
pc_tom (192.168.43.189) connect to service tomh initially as user tomh
(uid=2048, gid=1901) (pid 11920)
###########################################################################
Parts of the smb.conf:
[global]
unix charset = utf-8
workgroup = XXXXXXXX
netbios name = XXXXXXXX
server string = XXXXXXXX Domain Server
enable privileges = yes
interfaces = eth0,lo
bind interfaces only = Yes
passdb backend = ldapsam:ldapi://%2fvar%2frun%2fslapd%2fldapi
log level = 1
syslog = 0
log file = /var/log/samba/%m
max log size = 5000
smb ports = 137 138 139 445
name resolve order = wins bcast hosts
time server = Yes
printcap name = CUPS
# wins support = Yes
dns proxy = Yes
ldap suffix = dc=ebox
ldap machine suffix = ou=Computers
ldap user suffix = ou=Users
ldap group suffix = ou=Groups
ldap idmap suffix = ou=Idmap
ldap admin dn = cn=admin,dc=ebox
map acl inherit = Yes
printing = cups
encrypt passwords = Yes
obey pam restrictions = No
ldap passwd sync = Yes
mangling method = hash2
logon script = logon.bat
logon drive = I:
logon home =
logon path = \\XXXXXXXX\profiles\%U
domain logons = Yes
os level = 65
preferred master = Yes
domain master = Yes
add user script = /usr/sbin/smbldap-useradd -m "%u"
ldap delete dn = Yes
add machine script = /usr/sbin/smbldap-useradd -w "%u"
add group script = /usr/sbin/smbldap-groupadd -p "%g"
add user to group script = /usr/sbin/smbldap-groupmod -m "%u" "%g"
delete user from group script = /usr/sbin/smbldap-groupmod -x "%u" "%g"
set primary group script = /usr/sbin/smbldap-usermod -g "%g" "%u"
create mask = 770
directory mask = 770
force create mode = 770
force directory mode = 770
[.. snipp ...]
[projekte]
comment = XXXXXXXX Projekte
path = /home/samba/shares/projects
valid users = "XXXXXXXX"
read list =
write list = "XXXXXXXX"
admin users = "XXXXXXXX"
read only = No
browseable = Yes
###########################################################################
The samba shares are behind an Adaptec Raid Controller and are formated with
XFS. We did some memtests but they showed no problem. I would like to stick
to
the 3.0.x branch of samba and to the ubuntu package system (There must be a
reason for the LTS).
I've read the "samba freezes the server" thread but this is for a different
kernel.
Any help would be really appreciated
Thanks
Sebastian
More information about the samba
mailing list