Need help, Kernel panic: "Aiee, killing interrupt handler!"

David Gibson david at gibson.dropbear.id.au
Tue Jul 16 14:34:06 EST 2002


On Mon, Jul 15, 2002 at 10:52:12PM -0400, Christopher Abiad wrote:
> We (a group of 4 computer engineering students, including myself) have run 
> into problems while trying to build a robust encryption extension to the 
> orinoco Linux module for a school project.  The goal is to replace WEP for 
> 802.11b communications in a GPL'd solution. Our project is due this friday 
> and we desperately need your help.

Frankly I think this is better done at a level higher than the MAC -
with CIPE, IPSEC, or even ssh tunnels.  If it must be done at the MAC
level 802.1x is probably worth looking at.  I haven't looked at the
standard - I have no doubt it is a horrible designed-by-committee
mess, just like the rest of IEEE802, but at least it is a standard.

> In the current prototype, we call encrypt and decrypt functions in the 
> orinoco_xmit and orinoco_ev_rx functions respectively.  The system by and 
> large is already somewhat functional.  The problem is that *under heavy 
> load* (we believe it may have to do with sending large packets), the 
> kernel soon displays an Oops message, then immediately panics.  This panic 
> brings down the interrupt handler with the following message following the 
> Oops dump:
> 
> "<0>Kernel panic: Aiee, killing interrupt handler!
> In interrupt handler - not syncing"

Ah, yes, well, yet again we see the limitations of the kernel's oops
handling.  Just like every other oops that's ever been mailed to me,
this one looks bogus.  Chances are you have the wrong ksyms - since
you're loading as a module it's essential that you take a copy of
/proc/ksyms after loading all the modules, but before triggering the
oops, and use that to pass to ksymoops.

> As background, we've purchased a pair of Orinoco Silver cards and set up 2.
> 4.18 based systems with the 0.12b version of the orinoco driver for 
> development.  We tested with an unmodified driver and had no problems with 
> it (except a KERN_WARNING that hermes_init has been called more than once)
> , even under heavy load.

Hmm... I suggest you try 0.11b.  In 0.12b we're still ironing out
problems with the new locking scheme.

> Unfortunately, since the interrupt handler has been killed after the oops 
> above, writing the oops to the HD is impossible.  We have written two Oops 
> dumps down and passed them through ksymoops.  The output from ksymoops is 
> attached to this file.  It reveals that the assembly instruction "ud2a" 
> (0f 0b, the guaranteed invalid instruction) was called by the function 
> kmem_cache_destroy at 152/310.

As I said, the traceback looks bogus - why would kmalloc() call
hermes_bap_pwrite?

> 1. What sort of time restrictions are there within the orinoco_xmit and 
> orinoco_ev_rx methods?  Do we have time to perform streaming encryption 
> and decryption operations?  (we've achieved, for several seconds, 
> encrypted transfer speeds at 190KByte/sec before a panic)  Would a 
> function that takes too long to complete exhibit these symptoms?

Well, they shouldn't take too long, especially orinoco_ev_rx which
operates in hard interrupt context.  In particular they must not
sleep.  Sleeping from interrupt (or softirq) context is the most
likely cause of the "Aiee" message.

> 2. Are there other places in the driver code that we would have to modify 
> in order to be able to encrypt and decrypt packet bodies reliably?  For 
> example, if the card is reset, could anything happen that would be 
> confused if the data in the skb wasn't identical to the way that user mode 
> sent it, or that was different from the way it was received originally?

I don't understand the question/

> 3. We've verified that a orinoco_ev_rx function can interrupt the 
> orinoco_xmit function (which makes sense!), though this doesn't seem cause 
> us any problems.  We don't think that any other cases of interruption can 
> occur.  Are we missing anything?

Actually in 0.12b, the rx function shouldn't interrupt xmit - because
xmit should take the lock with interrupts disabled.

> 4. Why might modifying an skb directly before/after a transmit/receive 
> cause the panic to occur as soon as we load the modules (insmod orinoco_cs 
> completes)?  In a test, we attempted to skip the actual encryption and 
> replace it with a simple XOR of some constant with the sk_buff body 
> directly (not headers).  When we made a copy of the data in the char *p in 
> orinoco_xmit and orinoco_ev_rx it didn't fail on open, but still failed on 
> heavy TX/RX.

No idea.  Or it might not be that.  Frequently the devil is in the
details.  That's why debugging is hard.

> 5. Our encryption is loaded in a separate module with relevant functions 
> available as exported symbols.  Are there any considerations to this 
> design approach that we may be missing?
> 
> In addition to the ksymoops output, the full current working source is 
> included in this email.  The rijndael encryption implementation is not GPL,
>  regardless of what the license is set to in the module source, but is 
> free for non-commercial use.

Then you shouldn't set the module license to GPL.  That's grossly
misleading.

> Any and all help or suggestions will be welcome.  If you're knowledgeable 
> about these drivers, we're begging you... please help!  We'll be deeply 
> indebted to you.  If you require any further information on our troubles 
> (or about our project in general) I will provide it as soon as possible.
> 
> Thanks in advance,
> chris abiad
> Computer Engineering, Class of 2003
> University of Waterloo
> Waterloo, ON, Canada
> 





-- 
David Gibson			| For every complex problem there is a
david at gibson.dropbear.id.au	| solution which is simple, neat and
				| wrong.
http://www.ozlabs.org/people/dgibson




More information about the wireless mailing list