Curious how your test turned out.

 

You may also want to run an iostat to a file and see if that correlates to the slow responses.

 

However, that ‘bulging capacitor’ thing others have mentioned sounds like a pretty convincing coincidence, as it were….

 

(I will say that USUALLY I’d agree with JD – Iobound or low RAM (and thus iobound on swap space) are the only things I’ve seen that cause unresponsiveness (never seen an overheat slow it down, usually it just dies suddenly.  I probably get fast overheating and not slow increases in heat levels J)

 

OH!  WAIT!  I just remembered another event – and it WON’T show up in normal performance logs.  If ‘you’ send a command to a disk drive, and it goes busy for a long time, your system can become totally locked until the timeout happens and the kernel gives up.  (If that happens, there SHOULD be a timeout recorded in the syslog or /var/log/messages.  Check there for timeouts on disk drives or hard resets or such).  (I know this because of where I work J)  (Disk drives are supposed to acknowledge the command almost immediately.  It is almost always a bad thing when the drive takes the command but does not finish the initial command handshake sequence…  You might want to look at the S.M.A.R.T. attributes for your drives as well to see if any of them are showing ‘pre-fail’ conditions)

 

Rusty

 

From: plug-discuss-bounces@lists.plug.phoenix.az.us [mailto:plug-discuss-bounces@lists.plug.phoenix.az.us] On Behalf Of Mark Phillips
Sent: Monday, June 25, 2012 10:13 PM
To: Main PLUG discussion list
Subject: Re: Strange Server Behavior

 

Right now, the server is not doing anything but sitting there....

Tasks:  98 total,   1 running,  97 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.3%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   1033780k total,   217560k used,   816220k free,     6220k buffers
Swap:  2019320k total,        0k used,  2019320k free,    94056k cached

Plenty of swap, not very busy. It may be over heating, but not sure why.

I am going to run a test tonight - ping every 10 seconds and time stamp the output into a file. Perhaps I will see gaps or unusually long response times and I can correlate that with the log files.

Mark

On Mon, Jun 25, 2012 at 10:09 PM, JD Austin <> wrote:

I've had servers that act like that.. usually they're over heating, completely I/O bound, or swapping due to low available memory. 

 

On Mon, Jun 25, 2012 at 10:00 PM, Mark Phillips <mark@phillipsmarketing.biz> wrote:

Nope - everything just stops - ping waits for a response, web services just wait for the server, file transfers stop and wait.......as if time just stopped for the server, then starts again without any errors being evident.

Mark

 

On Mon, Jun 25, 2012 at 9:57 PM, Stephen < > wrote:

Can you do access any other services hosted by the server during this time? Or even an extended ping?

On Jun 25, 2012 9:53 PM, "Mark Phillips" < > wrote:

I have a headless server running Linux version 2.6.32-5-686 (Debian 2.6.32-45) (dannf@debian.org) (gcc version 4.3.5 (Debian 4.3.5-4) and no X or window manager, and I have noticed in the past couple of days that when I ssh in the server it occasionally stops responding for a minute or two, then comes back as if nothing had happened. It is a random event - maybe once an hour. I cannot find anything in the logs - no error messages. There is nothing wrong with the machine where I initiated the ssh session, and it is not connected to ssh. The server completely stops responding, then comes back as if nothing had happened.

How would I go about diagnosing this problem?

Thanks,