[linux-cifs-client] OOPS in 2.6.26
Jeff Layton
jlayton at redhat.com
Wed Jul 16 18:11:31 GMT 2008
On Wed, 16 Jul 2008 10:25:03 -0700
Gautam Iyer <gi1242+samba at stanford.edu> wrote:
> (Resending without attachments, as I think my post was automatically
> rejected. Jeff -- Attachments follow in an off list email).
>
> On Wed, Jul 16, 2008 at 10:26:20AM -0400, Jeff Layton wrote:
>
> >> I just upgraded to 2.6.26. On copying a large file from my server, my
> >> client Oops'ed, and eventually caused my system to become unusable.
> >> Here's the message I got:
> >>
> >> BUG: unable to handle kernel paging request at f8001d6f
> >> IP: [<f91c4b42>] :cifs:CIFSSMBQAllEAs+0x242/0x340
> >> *pde = 00000000
> >> Oops: 0000 [#1] SMP
> >> Modules linked in: nls_iso8859_1 cifs nls_base mmc_block b43 ssb rng_core mac80211 crc32 led_class input_polldev rfkill_input rfkill aes_i586 aes_generic libafs(P) e1000e i915 drm fuse ipv6 autofs4 ipt_recent ipt_addrtype xt_multiport xt_mac xt_state xt_tcpudp ipt_REJECT ipt_LOG xt_limit iptable_nat nf_nat nf_conntrack_ipv4 iptable_filter ip_tables xt_iprange x_tables nf_conntrack_ftp nf_conntrack snd_pcm_oss snd_mixer_oss snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device ext2 mbcache arc4 ecb crypto_blkcipher loop 8250_pnp 8250 serial_core acpi_cpufreq sg sr_mod cdrom usbhid usb_storage scsi_mod intelfb fb i2c_algo_bit cfbcopyarea intel_agp i2c_core button sdhci mmc_core video backlight output wmi battery ehci_hcd ac snd_hda_intel agpgart uhci_hcd snd_pcm snd_timer snd_page_alloc snd_hwdep cfbimgblt cfbfillrect snd serio_raw usbcore soundcore evdev [last unloaded: ricoh_mmc]
> >>
> >> Pid: 1417, comm: cp Tainted: P A (2.6.26 #1)
> >> EIP: 0060:[<f91c4b42>] EFLAGS: 00210282 CPU: 1
> >> EIP is at CIFSSMBQAllEAs+0x242/0x340 [cifs]
> >> EAX: f8001d6e EBX: f8001d6e ECX: 006d6495 EDX: 1a59604f
> >> ESI: d9cd003d EDI: 00000000 EBP: f8001d72 ESP: ed3cdec0
> >> DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> >> Process cp (pid: 1417, ti=ed3cc000 task=cf105b80 task.ti=ed3cc000)
> >> Stack: d9cd0000 ed3cdee4 00000000 ef9e97c8 f76abf40 d9f99c00 000008bf 006d6495
> >> dc44d000 0000004f d9cd0000 d9cd0000 f76abf40 ffffffa1 00000000 00000000
> >> f91de6da 00000000 00000000 f8b96ce0 00000000 000008bf f76e01c0 d9f99c00
> >> Call Trace:
> >> [<f91de6da>] cifs_listxattr+0xba/0x180 [cifs]
> >> [<f91de620>] cifs_listxattr+0x0/0x180 [cifs]
> >> [<c0194514>] vfs_listxattr+0x24/0x40
> >> [<c01947e0>] listxattr+0x50/0xb0
> >> [<c01948c9>] sys_llistxattr+0x39/0x50
> >> [<c01c7f40>] reiserfs_file_write+0x0/0xc0
> >> [<c0103bd9>] sysenter_past_esp+0x6a/0x91
> >> [<c0300000>] migration_call+0x340/0x460
> >> =======================
> >> Code: 85 d8 00 00 00 0f b6 43 01 0f b7 4b 02 29 c2 83 ea 05 29 ca 85 d2 0f 8e 63 ff ff ff 8d 44 05 01 01 c8 89 c3 8b 4c 24 1c 8d 68 04 <0f> b6 43 01 8d 44 08 06 39 44 24 48 89 44 24 1c 7e bc a1 34 f8
> >> EIP: [<f91c4b42>] CIFSSMBQAllEAs+0x242/0x340 [cifs] SS:ESP 0068:ed3cdec0
> >> ---[ end trace f626a9f8ae856e81 ]---
> >
> > Not a panic that I've seen before. Is this reproducible?
>
> Ah. Haven't tried reproducing. Since it's the machine I use primarily
> for work, intentionally crashing it will have to wait till the
> weekend...
>
> If I free up some disk space, I could try reproducing it under kvm.
>
> >> Finally, I should mention that I have a few send / receive errors in my
> >> /var/log/messages too:
> >>
> >> CIFS VFS: server not responding
> >> CIFS VFS: No response to cmd 46 mid 323
> >> CIFS VFS: Send error in read = -11
> >> CIFS VFS: Write2 ret -11, wrote 0
> >> CIFS VFS: No response for cmd 162 mid 14620
> >> CIFS VFS: No response to cmd 47 mid 14626
> >> CIFS VFS: No response to cmd 47 mid 14627
> >> CIFS VFS: Write2 ret -11, wrote 0
> >> CIFS VFS: No response for cmd 162 mid 14633
> >> CIFS VFS: No response to cmd 47 mid 14632
> >> CIFS VFS: Write2 ret -11, wrote 0
> >> CIFS VFS: No response to cmd 47 mid 14639
> >> CIFS VFS: No response to cmd 47 mid 14640
> >
> > cmd 47 (0x2f) is SMB_COM_WRITE_ANDX. 46 (0x2e) is SMB_COM_READ_ANDX.
> > Those errors mainly just mean that your server is being slow to
> > respond here. SMBQueryAllEAs uses SMB_COM_TRANSACTION2 (0x32) and I
> > don't see any of those in the logs above. Still though, I suppose it
> > could be related to retransmissions of those calls.
>
> Yes, my server is known to be slow. The samba throughput (on Mac/Linux)
> is much lower than the hard disk speed and channel capacity. Thus a lot
> of complicated operations (e.g. running mke2fs, or rsync-ing a large
> directory) tend to fail.
>
> This is the first time I've had trouble with cp though...
>
> >> Any idea what's going on? My server is a Vantex Nexstar LX, and my
> >> client runs Gentoo (if that makes any difference). I attach my kernel
> >> configuration,
> >
> > Any chance you could bzip2 cifs.ko module and send it to me? It would
> > be nice to disassemble it and see if we can tell where it fell down.
>
> Of course. I have no control over the server. But my cifs.ko and
> System.map are both attached. Let me know if you need anything else,
>
> GI
>
Thanks for the info. Here's some disassembly from around that area:
5b24: 29 c2 sub %eax,%edx
5b26: 83 ea 05 sub $0x5,%edx
5b29: 29 ca sub %ecx,%edx
5b2b: 85 d2 test %edx,%edx
5b2d: 0f 8e 63 ff ff ff jle 5a96 <CIFSSMBQAllEAs+0x196>
5b33: 8d 44 05 01 lea 0x1(%ebp,%eax,1),%eax
5b37: 01 c8 add %ecx,%eax
5b39: 89 c3 mov %eax,%ebx
5b3b: 8b 4c 24 1c mov 0x1c(%esp),%ecx
5b3f: 8d 68 04 lea 0x4(%eax),%ebp
5b42: 0f b6 43 01 movzbl 0x1(%ebx),%eax <<<< CRASH HERE
5b46: 8d 44 08 06 lea 0x6(%eax,%ecx,1),%eax
5b4a: 39 44 24 48 cmp %eax,0x48(%esp)
5b4e: 89 44 24 1c mov %eax,0x1c(%esp)
5b52: 7e bc jle 5b10 <CIFSSMBQAllEAs+0x210>
5b54: a1 5c 01 00 00 mov 0x15c,%eax
5b59: 89 ee mov %ebp,%esi
5b5b: c6 47 04 2e movb $0x2e,0x4(%edi)
5b5f: 89 07 mov %eax,(%edi)
5b61: 83 c7 05 add $0x5,%edi
5b64: 89 7c 24 0c mov %edi,0xc(%esp)
5b68: 0f b6 43 01 movzbl 0x1(%ebx),%eax
5b6c: 89 c1 mov %eax,%ecx
5b6e: c1 e9 02 shr $0x2,%ecx
Large hairy function here and not a lot of handy markers nearby. So
we're zero extending the byte at address in %ebx+1 and then copying
that result to %eax. That jives with the oops message, but I'm having
problems matching up the assembly with C code.
My guess is that %ebx is intended to hold a "struct fea" at this time
and the crash occurred while trying to reference its name_len. Nothing
stands out at me as a bug here though. A reproducer would sure be nice.
--
Jeff Layton <jlayton at redhat.com>
More information about the linux-cifs-client
mailing list