My ping test is not what I expected but still shows a problem.

I setup the test to ping (64 bytes, ttl =64) the problem server every 10 seconds from my laptop. Both are plugged in, on the same subnet. The boxes are about 5 feet apart. Here are the results:

3434 packets transmitted, 3307 received, 3% packet loss, time 34336321ms rtt min/avg/max/mdev = 0.100/0.231/4.332/0.275 ms

I found 38 instances where the time stamps for the pings "hiccuped" and there was a delay. Each hiccup lasted (min/avg/max) 10/44/70 seconds. The time between hiccups was (min/avg/max) 1:20/14:29/29:00 minutes.

I grep'd the log files for messages around these 38 incidents, but did not find any messages in any of the logs. However this is not a good test, so I just trolled the logs and didn't find anything significant.

Do these numbers strike a chord with anyone?

I will check the caps on the MB later this week.

I looked in syslog, and could not find any correlation with the 38 hiccups. However, what do these two cron jobs do, since the run quite frequently:
CMD (   cd / && run-parts --report /etc/cron.hourly)
CMD (  [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime) -delete)

A search through the log files for "error" does not return anything interesting.

Since apache and few other apps were running on the server, I will run the ping test again tonight after I kill everything on the server.

Thanks!

Mark

On Tue, Jun 26, 2012 at 9:35 AM, Carruth, Rusty <Rusty.Carruth@smartstoragesys.com> wrote:

Curious how your test turned out.

 

You may also want to run an iostat to a file and see if that correlates to the slow responses.

 

However, that ‘bulging capacitor’ thing others have mentioned sounds like a pretty convincing coincidence, as it were….

 

(I will say that USUALLY I’d agree with JD – Iobound or low RAM (and thus iobound on swap space) are the only things I’ve seen that cause unresponsiveness (never seen an overheat slow it down, usually it just dies suddenly.  I probably get fast overheating and not slow increases in heat levels J)

 

OH!  WAIT!  I just remembered another event – and it WON’T show up in normal performance logs.  If ‘you’ send a command to a disk drive, and it goes busy for a long time, your system can become totally locked until the timeout happens and the kernel gives up.  (If that happens, there SHOULD be a timeout recorded in the syslog or /var/log/messages.  Check there for timeouts on disk drives or hard resets or such).  (I know this because of where I work J)  (Disk drives are supposed to acknowledge the command almost immediately.  It is almost always a bad thing when the drive takes the command but does not finish the initial command handshake sequence…  You might want to look at the S.M.A.R.T. attributes for your drives as well to see if any of them are showing ‘pre-fail’ conditions)

 

Rusty

 

From: plug-discuss-bounces@lists.plug.phoenix.az.us [mailto:plug-discuss-bounces@lists.plug.phoenix.az.us] On Behalf Of Mark Phillips
Sent: Monday, June 25, 2012 10:13 PM
To: Main PLUG discussion list
Subject: Re: Strange Server Behavior

 

Right now, the server is not doing anything but sitting there....

Tasks:  98 total,   1 running,  97 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.3%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   1033780k total,   217560k used,   816220k free,     6220k buffers
Swap:  2019320k total,        0k used,  2019320k free,    94056k cached

Plenty of swap, not very busy. It may be over heating, but not sure why.

I am going to run a test tonight - ping every 10 seconds and time stamp the output into a file. Perhaps I will see gaps or unusually long response times and I can correlate that with the log files.

Mark

On Mon, Jun 25, 2012 at 10:09 PM, JD Austin <> wrote:

I've had servers that act like that.. usually they're over heating, completely I/O bound, or swapping due to low available memory. 

 

On Mon, Jun 25, 2012 at 10:00 PM, Mark Phillips <mark@phillipsmarketing.biz> wrote:

Nope - everything just stops - ping waits for a response, web services just wait for the server, file transfers stop and wait.......as if time just stopped for the server, then starts again without any errors being evident.

Mark

 

On Mon, Jun 25, 2012 at 9:57 PM, Stephen < > wrote:

Can you do access any other services hosted by the server during this time? Or even an extended ping?

On Jun 25, 2012 9:53 PM, "Mark Phillips" < > wrote:

I have a headless server running Linux version 2.6.32-5-686 (Debian 2.6.32-45) (dannf@debian.org) (gcc version 4.3.5 (Debian 4.3.5-4) and no X or window manager, and I have noticed in the past couple of days that when I ssh in the server it occasionally stops responding for a minute or two, then comes back as if nothing had happened. It is a random event - maybe once an hour. I cannot find anything in the logs - no error messages. There is nothing wrong with the machine where I initiated the ssh session, and it is not connected to ssh. The server completely stops responding, then comes back as if nothing had happened.

How would I go about diagnosing this problem?

Thanks,


 


---------------------------------------------------
PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
To subscribe, unsubscribe, or to change your mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss