Linux Stability Question

George Toft plug-discuss@lists.plug.phoenix.az.us
Sat, 10 Aug 2002 14:50:12 -0400


This is going to be a fun one.

You need to set up some monitoring to see what's going on inside the
box.  How many processes are running? How much CPU are the top 10 taking
up?  How much RAM is there, and are you paging excessively?  What is the
load average?  What is the CPU temperature?  How many file descriptors
are you using, and are your network connections being cleared in a
timely fashion?  Once you track these items, record them each minute to
a log.  Next time it happens, look at the log.  

I just spent several weeks in this excercise to track down *two* heat
issues.  My box would run for 3-6 days, then slow way down and thrash
the hard drive.  If I restarted X, I could salvage the machine about 1/2
the time.  Otherwise, I had to reset the computer and suffer the fsck
from hell.

Final analysis on my problem?  Video chip on my $25 S3 "nothing special"
video card was overheating, and my CPU was overheating.  I put a fan
blowing across the video card (it was not made to be cooled), and put a
2GHZ cooler (overkill) on my 600 HHZ CPU.  Box has been rock solid ever
since.

George


> Phil Mattison wrote:
> 
> Here's a question for the gurus of Linux Dark Mojo:
> I have a RH Linux 7.2 box in my office I'm using for PHP/MySQL
> development.
> I'm running PHP 4.0.6, MySQL 3.23.41, Samba 2.2.1a, and Apache 1.3.20,
> on a PIII-600MHz box. I have several virtual hosts configured under
> the Apache server.
> I use HomeSite 4.5 on a Win98 box to edit the source files,
> and work directly in the Apache server web directories via SMB.
> Normally it all works like greased lightning, but sometimes after
> about
> a week or so of continuous operation, the Linux/Apache machine starts
> to slow down, and eventually stops responding. Sometimes restarting
> httpd
> or network will fix it, and sometimes I have to reboot to get it back
> up.
> I'm not complaining, its still a hell of a lot better than rebooting
> twice a day with Windows, but my production server is solid as a rock
> and is configured nearly identical, so I'm wondering what I'm doing
> wrong with the local machine. Any ideas?
> --
> Phil Mattison
> Ohmikron Corp.
> 480-722-9595
> 602-820-9452 Mobile