Linux Stability Question

Phil Mattison plug-discuss@lists.plug.phoenix.az.us
Sat, 10 Aug 2002 12:25:51 -0700


Thanks, I'll try (some of) that. --Phil M.
-------------------------------
Message: 14
Date: Sat, 10 Aug 2002 14:50:12 -0400
From: George Toft <george@georgetoft.com>
To: plug-discuss@lists.plug.phoenix.az.us
Subject: Re: Linux Stability Question
Reply-To: plug-discuss@lists.plug.phoenix.az.us

This is going to be a fun one.

You need to set up some monitoring to see what's going on inside the
box.  How many processes are running? How much CPU are the top 10 taking
up?  How much RAM is there, and are you paging excessively?  What is the
load average?  What is the CPU temperature?  How many file descriptors
are you using, and are your network connections being cleared in a
timely fashion?  Once you track these items, record them each minute to
a log.  Next time it happens, look at the log.  

I just spent several weeks in this excercise to track down *two* heat
issues.  My box would run for 3-6 days, then slow way down and thrash
the hard drive.  If I restarted X, I could salvage the machine about 1/2
the time.  Otherwise, I had to reset the computer and suffer the fsck
from hell.

Final analysis on my problem?  Video chip on my $25 S3 "nothing special"
video card was overheating, and my CPU was overheating.  I put a fan
blowing across the video card (it was not made to be cooled), and put a
2GHZ cooler (overkill) on my 600 HHZ CPU.  Box has been rock solid ever
since.

George
--------------------------------
--
Phil Mattison
Ohmikron Corp.
480-722-9595
602-820-9452 Mobile