Strange Server Behavior
Mark Phillips
mark at phillipsmarketing.biz
Tue Jun 26 19:42:38 MST 2012
My ping test is not what I expected but still shows a problem.
I setup the test to ping (64 bytes, ttl =64) the problem server every 10
seconds from my laptop. Both are plugged in, on the same subnet. The boxes
are about 5 feet apart. Here are the results:
3434 packets transmitted, 3307 received, 3% packet loss, time 34336321ms
rtt min/avg/max/mdev = 0.100/0.231/4.332/0.275 ms
I found 38 instances where the time stamps for the pings "hiccuped" and
there was a delay. Each hiccup lasted (min/avg/max) 10/44/70 seconds. The
time between hiccups was (min/avg/max) 1:20/14:29/29:00 minutes.
I grep'd the log files for messages around these 38 incidents, but did not
find any messages in any of the logs. However this is not a good test, so I
just trolled the logs and didn't find anything significant.
Do these numbers strike a chord with anyone?
I will check the caps on the MB later this week.
I looked in syslog, and could not find any correlation with the 38 hiccups.
However, what do these two cron jobs do, since the run quite frequently:
CMD ( cd / && run-parts --report /etc/cron.hourly)
CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find
/var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime) -delete)
A search through the log files for "error" does not return anything
interesting.
Since apache and few other apps were running on the server, I will run the
ping test again tonight after I kill everything on the server.
Thanks!
Mark
On Tue, Jun 26, 2012 at 9:35 AM, Carruth, Rusty <
Rusty.Carruth at smartstoragesys.com> wrote:
> Curious how your test turned out.****
>
> ** **
>
> You may also want to run an iostat to a file and see if that correlates to
> the slow responses.****
>
> ** **
>
> However, that ‘bulging capacitor’ thing others have mentioned sounds like
> a pretty convincing coincidence, as it were….****
>
> ** **
>
> (I will say that USUALLY I’d agree with JD – Iobound or low RAM (and thus
> iobound on swap space) are the only things I’ve seen that cause
> unresponsiveness (never seen an overheat slow it down, usually it just dies
> suddenly. I probably get fast overheating and not slow increases in heat
> levels J)****
>
> ** **
>
> OH! WAIT! I just remembered another event – and it WON’T show up in
> normal performance logs. If ‘you’ send a command to a disk drive, and it
> goes busy for a long time, your system can become totally locked until the
> timeout happens and the kernel gives up. (If that happens, there SHOULD be
> a timeout recorded in the syslog or /var/log/messages. Check there for
> timeouts on disk drives or hard resets or such). (I know this because of
> where I work J) (Disk drives are supposed to acknowledge the command
> almost immediately. It is almost always a bad thing when the drive takes
> the command but does not finish the initial command handshake sequence…
> You might want to look at the S.M.A.R.T. attributes for your drives as well
> to see if any of them are showing ‘pre-fail’ conditions)****
>
> ** **
>
> Rusty****
>
> ** **
>
> *From:* plug-discuss-bounces at lists.plug.phoenix.az.us [mailto:
> plug-discuss-bounces at lists.plug.phoenix.az.us] *On Behalf Of *Mark
> Phillips
> *Sent:* Monday, June 25, 2012 10:13 PM
> *To:* Main PLUG discussion list
> *Subject:* Re: Strange Server Behavior****
>
> ** **
>
> Right now, the server is not doing anything but sitting there....
>
> Tasks: 98 total, 1 running, 97 sleeping, 0 stopped, 0 zombie
> Cpu(s): 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si,
> 0.0%st
> Mem: 1033780k total, 217560k used, 816220k free, 6220k buffers
> Swap: 2019320k total, 0k used, 2019320k free, 94056k cached
>
> Plenty of swap, not very busy. It may be over heating, but not sure why.
>
> I am going to run a test tonight - ping every 10 seconds and time stamp
> the output into a file. Perhaps I will see gaps or unusually long response
> times and I can correlate that with the log files.
>
> Mark****
>
> On Mon, Jun 25, 2012 at 10:09 PM, JD Austin < <jd at twingeckos.com>> wrote:*
> ***
>
> I've had servers that act like that.. usually they're over heating,
> completely I/O bound, or swapping due to low available memory. ****
>
> ** **
>
> On Mon, Jun 25, 2012 at 10:00 PM, Mark Phillips <
> mark at phillipsmarketing.biz> wrote:****
>
> Nope - everything just stops - ping waits for a response, web services
> just wait for the server, file transfers stop and wait.......as if time
> just stopped for the server, then starts again without any errors being
> evident.
>
> Mark****
>
> ** **
>
> On Mon, Jun 25, 2012 at 9:57 PM, Stephen < > wrote:****
>
> Can you do access any other services hosted by the server during this
> time? Or even an extended ping?****
>
> On Jun 25, 2012 9:53 PM, "Mark Phillips" < > wrote:****
>
> I have a headless server running Linux version 2.6.32-5-686 (Debian
> 2.6.32-45) (dannf at debian.org) (gcc version 4.3.5 (Debian 4.3.5-4) and no
> X or window manager, and I have noticed in the past couple of days that
> when I ssh in the server it occasionally stops responding for a minute or
> two, then comes back as if nothing had happened. It is a random event -
> maybe once an hour. I cannot find anything in the logs - no error messages.
> There is nothing wrong with the machine where I initiated the ssh session,
> and it is not connected to ssh. The server completely stops responding,
> then comes back as if nothing had happened.
>
> How would I go about diagnosing this problem?
>
> Thanks,
>
>
> ****
>
> ** **
>
> ---------------------------------------------------
> PLUG-discuss mailing list - PLUG-discuss at lists.plug.phoenix.az.us
> To subscribe, unsubscribe, or to change your mail settings:
> http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.PLUG.phoenix.az.us/pipermail/plug-discuss/attachments/20120626/faf3c4f3/attachment.html>
More information about the PLUG-discuss
mailing list