Strange Server Behavior

Tue Jun 26 19:42:38 MST 2012

My ping test is not what I expected but still shows a problem.

I setup the test to ping (64 bytes, ttl =64) the problem server every 10
seconds from my laptop. Both are plugged in, on the same subnet. The boxes
are about 5 feet apart. Here are the results:

3434 packets transmitted, 3307 received, 3% packet loss, time 34336321ms
rtt min/avg/max/mdev = 0.100/0.231/4.332/0.275 ms

I found 38 instances where the time stamps for the pings "hiccuped" and
there was a delay. Each hiccup lasted (min/avg/max) 10/44/70 seconds. The
time between hiccups was (min/avg/max) 1:20/14:29/29:00 minutes.

I grep'd the log files for messages around these 38 incidents, but did not
find any messages in any of the logs. However this is not a good test, so I
just trolled the logs and didn't find anything significant.

Do these numbers strike a chord with anyone?

I will check the caps on the MB later this week.

I looked in syslog, and could not find any correlation with the 38 hiccups.
However, what do these two cron jobs do, since the run quite frequently:
CMD (   cd / && run-parts --report /etc/cron.hourly)
CMD (  [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find
/var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime) -delete)

A search through the log files for "error" does not return anything
interesting.

Since apache and few other apps were running on the server, I will run the
ping test again tonight after I kill everything on the server.

Thanks!

Mark

On Tue, Jun 26, 2012 at 9:35 AM, Carruth, Rusty <
Rusty.Carruth at smartstoragesys.com> wrote:

> Curious how your test turned out.****
>
> ** **
>
> You may also want to run an iostat to a file and see if that correlates to
> the slow responses.****
>
> ** **
>
> However, that ‘bulging capacitor’ thing others have mentioned sounds like
> a pretty convincing coincidence, as it were….****
>
> ** **
>
> (I will say that USUALLY I’d agree with JD – Iobound or low RAM (and thus
> iobound on swap space) are the only things I’ve seen that cause
> unresponsiveness (never seen an overheat slow it down, usually it just dies
> suddenly.  I probably get fast overheating and not slow increases in heat
> levels J)****
>
> ** **
>
> OH!  WAIT!  I just remembered another event – and it WON’T show up in
> normal performance logs.  If ‘you’ send a command to a disk drive, and it
> goes busy for a long time, your system can become totally locked until the
> timeout happens and the kernel gives up.  (If that happens, there SHOULD be
> a timeout recorded in the syslog or /var/log/messages.  Check there for
> timeouts on disk drives or hard resets or such).  (I know this because of
> where I work J)  (Disk drives are supposed to acknowledge the command
> almost immediately.  It is almost always a bad thing when the drive takes
> the command but does not finish the initial command handshake sequence…
> You might want to look at the S.M.A.R.T. attributes for your drives as well
> to see if any of them are showing ‘pre-fail’ conditions)****
>
> ** **
>
> Rusty****
>
> ** **
>
> *From:* plug-discuss-bounces at lists.plug.phoenix.az.us [mailto:
> plug-discuss-bounces at lists.plug.phoenix.az.us] *On Behalf Of *Mark
> Phillips
> *Sent:* Monday, June 25, 2012 10:13 PM
> *To:* Main PLUG discussion list
> *Subject:* Re: Strange Server Behavior****
>
> ** **
>
> Right now, the server is not doing anything but sitting there....
>
> Tasks:  98 total,   1 running,  97 sleeping,   0 stopped,   0 zombie
> Cpu(s):  0.0%us,  0.3%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Mem:   1033780k total,   217560k used,   816220k free,     6220k buffers
> Swap:  2019320k total,        0k used,  2019320k free,    94056k cached
>
> Plenty of swap, not very busy. It may be over heating, but not sure why.
>
> I am going to run a test tonight - ping every 10 seconds and time stamp
> the output into a file. Perhaps I will see gaps or unusually long response
> times and I can correlate that with the log files.
>
> Mark****
>
> On Mon, Jun 25, 2012 at 10:09 PM, JD Austin < <jd at twingeckos.com>> wrote:*
> ***
>
> I've had servers that act like that.. usually they're over heating,
> completely I/O bound, or swapping due to low available memory. ****
>
> ** **
>
> On Mon, Jun 25, 2012 at 10:00 PM, Mark Phillips <
> mark at phillipsmarketing.biz> wrote:****
>
> Nope - everything just stops - ping waits for a response, web services
> just wait for the server, file transfers stop and wait.......as if time
> just stopped for the server, then starts again without any errors being
> evident.
>
> Mark****
>
> ** **
>
> On Mon, Jun 25, 2012 at 9:57 PM, Stephen < > wrote:****
>
> Can you do access any other services hosted by the server during this
> time? Or even an extended ping?****
>
> On Jun 25, 2012 9:53 PM, "Mark Phillips" < > wrote:****
>
> I have a headless server running Linux version 2.6.32-5-686 (Debian
> 2.6.32-45) (dannf at debian.org) (gcc version 4.3.5 (Debian 4.3.5-4) and no
> X or window manager, and I have noticed in the past couple of days that
> when I ssh in the server it occasionally stops responding for a minute or
> two, then comes back as if nothing had happened. It is a random event -
> maybe once an hour. I cannot find anything in the logs - no error messages.
> There is nothing wrong with the machine where I initiated the ssh session,
> and it is not connected to ssh. The server completely stops responding,
> then comes back as if nothing had happened.
>
> How would I go about diagnosing this problem?
>
> Thanks,
>
>
> ****
>
> ** **
>
> ---------------------------------------------------
> PLUG-discuss mailing list - PLUG-discuss at lists.plug.phoenix.az.us
> To subscribe, unsubscribe, or to change your mail settings:
> http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.PLUG.phoenix.az.us/pipermail/plug-discuss/attachments/20120626/faf3c4f3/attachment.html>