[linux-cifs-client] OOPS in 2.6.26

Wed Jul 16 18:11:31 GMT 2008

On Wed, 16 Jul 2008 10:25:03 -0700
Gautam Iyer <gi1242+samba at stanford.edu> wrote:

> (Resending without attachments, as I think my post was automatically
> rejected. Jeff -- Attachments follow in an off list email).
> 
> On Wed, Jul 16, 2008 at 10:26:20AM -0400, Jeff Layton wrote:
> 
> >> I just upgraded to 2.6.26. On copying a large file from my server, my
> >> client Oops'ed, and eventually caused my system to become unusable.
> >> Here's the message I got:
> >> 
> >>     BUG: unable to handle kernel paging request at f8001d6f
> >>     IP: [<f91c4b42>] :cifs:CIFSSMBQAllEAs+0x242/0x340
> >>     *pde = 00000000 
> >>     Oops: 0000 [#1] SMP 
> >>     Modules linked in: nls_iso8859_1 cifs nls_base mmc_block b43 ssb rng_core mac80211 crc32 led_class input_polldev rfkill_input rfkill aes_i586 aes_generic libafs(P) e1000e i915 drm fuse ipv6 autofs4 ipt_recent ipt_addrtype xt_multiport xt_mac xt_state xt_tcpudp ipt_REJECT ipt_LOG xt_limit iptable_nat nf_nat nf_conntrack_ipv4 iptable_filter ip_tables xt_iprange x_tables nf_conntrack_ftp nf_conntrack snd_pcm_oss snd_mixer_oss snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device ext2 mbcache arc4 ecb crypto_blkcipher loop 8250_pnp 8250 serial_core acpi_cpufreq sg sr_mod cdrom usbhid usb_storage scsi_mod intelfb fb i2c_algo_bit cfbcopyarea intel_agp i2c_core button sdhci mmc_core video backlight output wmi battery ehci_hcd ac snd_hda_intel agpgart uhci_hcd snd_pcm snd_timer snd_page_alloc snd_hwdep cfbimgblt cfbfillrect snd serio_raw usbcore soundcore evdev [last unloaded: ricoh_mmc]
> >>     
> >>     Pid: 1417, comm: cp Tainted: P       A  (2.6.26 #1)
> >>     EIP: 0060:[<f91c4b42>] EFLAGS: 00210282 CPU: 1
> >>     EIP is at CIFSSMBQAllEAs+0x242/0x340 [cifs]
> >>     EAX: f8001d6e EBX: f8001d6e ECX: 006d6495 EDX: 1a59604f
> >>     ESI: d9cd003d EDI: 00000000 EBP: f8001d72 ESP: ed3cdec0
> >>     DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> >>     Process cp (pid: 1417, ti=ed3cc000 task=cf105b80 task.ti=ed3cc000)
> >>     Stack: d9cd0000 ed3cdee4 00000000 ef9e97c8 f76abf40 d9f99c00 000008bf 006d6495 
> >>     dc44d000 0000004f d9cd0000 d9cd0000 f76abf40 ffffffa1 00000000 00000000 
> >>     f91de6da 00000000 00000000 f8b96ce0 00000000 000008bf f76e01c0 d9f99c00 
> >>     Call Trace:
> >>     [<f91de6da>] cifs_listxattr+0xba/0x180 [cifs]
> >>     [<f91de620>] cifs_listxattr+0x0/0x180 [cifs]
> >>     [<c0194514>] vfs_listxattr+0x24/0x40
> >>     [<c01947e0>] listxattr+0x50/0xb0
> >>     [<c01948c9>] sys_llistxattr+0x39/0x50
> >>     [<c01c7f40>] reiserfs_file_write+0x0/0xc0
> >>     [<c0103bd9>] sysenter_past_esp+0x6a/0x91
> >>     [<c0300000>] migration_call+0x340/0x460
> >>     =======================
> >>     Code: 85 d8 00 00 00 0f b6 43 01 0f b7 4b 02 29 c2 83 ea 05 29 ca 85 d2 0f 8e 63 ff ff ff 8d 44 05 01 01 c8 89 c3 8b 4c 24 1c 8d 68 04 <0f> b6 43 01 8d 44 08 06 39 44 24 48 89 44 24 1c 7e bc a1 34 f8 
> >>     EIP: [<f91c4b42>] CIFSSMBQAllEAs+0x242/0x340 [cifs] SS:ESP 0068:ed3cdec0
> >>     ---[ end trace f626a9f8ae856e81 ]---
> > 
> > Not a panic that I've seen before. Is this reproducible?
> 
> Ah. Haven't tried reproducing. Since it's the machine I use primarily
> for work, intentionally crashing it will have to wait till the
> weekend...
> 
> If I free up some disk space, I could try reproducing it under kvm.
> 
> >> Finally, I should mention that I have a few send / receive errors in my
> >> /var/log/messages too:
> >> 
> >>     CIFS VFS: server not responding
> >>     CIFS VFS: No response to cmd 46 mid 323
> >>     CIFS VFS: Send error in read = -11
> >>     CIFS VFS: Write2 ret -11, wrote 0
> >>     CIFS VFS: No response for cmd 162 mid 14620
> >>     CIFS VFS: No response to cmd 47 mid 14626
> >>     CIFS VFS: No response to cmd 47 mid 14627
> >>     CIFS VFS: Write2 ret -11, wrote 0
> >>     CIFS VFS: No response for cmd 162 mid 14633
> >>     CIFS VFS: No response to cmd 47 mid 14632
> >>     CIFS VFS: Write2 ret -11, wrote 0
> >>     CIFS VFS: No response to cmd 47 mid 14639
> >>     CIFS VFS: No response to cmd 47 mid 14640
> > 
> > cmd 47 (0x2f) is SMB_COM_WRITE_ANDX. 46 (0x2e) is SMB_COM_READ_ANDX.
> > Those errors mainly just mean that your server is being slow to
> > respond here. SMBQueryAllEAs uses SMB_COM_TRANSACTION2 (0x32) and I
> > don't see any of those in the logs above. Still though, I suppose it
> > could be related to retransmissions of those calls. 
> 
> Yes, my server is known to be slow. The samba throughput (on Mac/Linux)
> is much lower than the hard disk speed and channel capacity. Thus a lot
> of complicated operations (e.g. running mke2fs, or rsync-ing a large
> directory) tend to fail.
> 
> This is the first time I've had trouble with cp though...
> 
> >> Any idea what's going on? My server is a Vantex Nexstar LX, and my
> >> client runs Gentoo (if that makes any difference). I attach my kernel
> >> configuration,
> > 
> > Any chance you could bzip2 cifs.ko module and send it to me? It would
> > be nice to disassemble it and see if we can tell where it fell down.
> 
> Of course. I have no control over the server. But my cifs.ko and
> System.map are both attached. Let me know if you need anything else,
> 
> GI
> 

Thanks for the info. Here's some disassembly from around that area:

    5b24:       29 c2                   sub    %eax,%edx
    5b26:       83 ea 05                sub    $0x5,%edx
    5b29:       29 ca                   sub    %ecx,%edx
    5b2b:       85 d2                   test   %edx,%edx
    5b2d:       0f 8e 63 ff ff ff       jle    5a96 <CIFSSMBQAllEAs+0x196>
    5b33:       8d 44 05 01             lea    0x1(%ebp,%eax,1),%eax
    5b37:       01 c8                   add    %ecx,%eax
    5b39:       89 c3                   mov    %eax,%ebx
    5b3b:       8b 4c 24 1c             mov    0x1c(%esp),%ecx
    5b3f:       8d 68 04                lea    0x4(%eax),%ebp
    5b42:       0f b6 43 01             movzbl 0x1(%ebx),%eax  <<<< CRASH HERE
    5b46:       8d 44 08 06             lea    0x6(%eax,%ecx,1),%eax
    5b4a:       39 44 24 48             cmp    %eax,0x48(%esp)
    5b4e:       89 44 24 1c             mov    %eax,0x1c(%esp)
    5b52:       7e bc                   jle    5b10 <CIFSSMBQAllEAs+0x210>
    5b54:       a1 5c 01 00 00          mov    0x15c,%eax
    5b59:       89 ee                   mov    %ebp,%esi
    5b5b:       c6 47 04 2e             movb   $0x2e,0x4(%edi)
    5b5f:       89 07                   mov    %eax,(%edi)
    5b61:       83 c7 05                add    $0x5,%edi
    5b64:       89 7c 24 0c             mov    %edi,0xc(%esp)
    5b68:       0f b6 43 01             movzbl 0x1(%ebx),%eax
    5b6c:       89 c1                   mov    %eax,%ecx
    5b6e:       c1 e9 02                shr    $0x2,%ecx

Large hairy function here and not a lot of handy markers nearby. So
we're zero extending the byte at address in %ebx+1 and then copying
that result to %eax. That jives with the oops message, but I'm having 
problems matching up the assembly with C code.

My guess is that %ebx is intended to hold a "struct fea" at this time
and the crash occurred while trying to reference its name_len. Nothing
stands out at me as a bug here though. A reproducer would sure be nice.

-- 
Jeff Layton <jlayton at redhat.com>