kernel very unstable

Pete Buechler Pete Buechler <peter.buechler@home.com>
Mon, 12 Mar 2001 21:48:34 -0700


On Sunday 11 March 2001 02:51 pm, Lucas Vogel wrote:
> > > My stock SuSE 2.2.16 kernel keeps intermittently crashing
> >
> > and hosing my
> >
> > > machine to where I have to do a hard reboot. It seems to
> >
> > generate some kind
> >
> > > of oops on the kmem_free function.
> >
> > In general, any regular crashing by a stock kernel points to hardware
> > problems on your end.  No kernel that ships with any of the major
> > distributions will give these kind of problems on halfway decent
> > hardware.  In general.
>
> How do I diagnose and fix something like this then?
>

Yikes. I did not see any replies from anybody who is an expert. So you will 
have to take my advice.

First examine your syslogs. You already did that, because you told us that 
you saw MARK in there a bunch of times. Anything else of use in there?

Second try to think of what you were doing at the time of the crashes. See if 
that gives you any ideas.

Third, if you have any Western Digital drives use hdparm to turn off DMA for 
them. Their implementation of UDMA-66 ignores CRC problems (dumm).

Fourth, make sure that you do not have any IRQ conflicts. Make sure you note 
which IRQs are used by what and double-check this by looking at the contents 
of /proc/interrupts.

Fifth, capture the oops (maybe with a pencil and paper) and run it through 
ksymoops (look for directions with the Linux kernel source code, under the 
Documentation directory in a file called oops-tracing.txt). If you can figure 
out where the code was when it crashed maybe you can get a hint as to the 
problem.

Sixth, see if you have any diagnostics for your hardware that came with them. 
Or, go to the web sites of the companies

Seventh, strip your system down to the bare essentials - monitor, keyboard, 
mouse, motherboard and one drive. See how that runs. Then add hardware back 
in one at a time, see what causes the destabilization.

Hardware problems can be a real pain to diagnose. I say we all go out and get 
computers built with self-checking pairs. That will help stimulate the 
economy :-)

-Pete-