My ping test is not what I expected but still shows a problem. I setup the test to ping (64 bytes, ttl =64) the problem server every 10 seconds from my laptop. Both are plugged in, on the same subnet. The boxes are about 5 feet apart. Here are the results: 3434 packets transmitted, 3307 received, 3% packet loss, time 34336321ms rtt min/avg/max/mdev = 0.100/0.231/4.332/0.275 ms I found 38 instances where the time stamps for the pings "hiccuped" and there was a delay. Each hiccup lasted (min/avg/max) 10/44/70 seconds. The time between hiccups was (min/avg/max) 1:20/14:29/29:00 minutes. I grep'd the log files for messages around these 38 incidents, but did not find any messages in any of the logs. However this is not a good test, so I just trolled the logs and didn't find anything significant. Do these numbers strike a chord with anyone? I will check the caps on the MB later this week. I looked in syslog, and could not find any correlation with the 38 hiccups. However, what do these two cron jobs do, since the run quite frequently: CMD ( cd / && run-parts --report /etc/cron.hourly) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime) -delete) A search through the log files for "error" does not return anything interesting. Since apache and few other apps were running on the server, I will run the ping test again tonight after I kill everything on the server. Thanks! Mark On Tue, Jun 26, 2012 at 9:35 AM, Carruth, Rusty < Rusty.Carruth@smartstoragesys.com> wrote: > Curious how your test turned out.**** > > ** ** > > You may also want to run an iostat to a file and see if that correlates to > the slow responses.**** > > ** ** > > However, that ‘bulging capacitor’ thing others have mentioned sounds like > a pretty convincing coincidence, as it were….**** > > ** ** > > (I will say that USUALLY I’d agree with JD – Iobound or low RAM (and thus > iobound on swap space) are the only things I’ve seen that cause > unresponsiveness (never seen an overheat slow it down, usually it just dies > suddenly. I probably get fast overheating and not slow increases in heat > levels J)**** > > ** ** > > OH! WAIT! I just remembered another event – and it WON’T show up in > normal performance logs. If ‘you’ send a command to a disk drive, and it > goes busy for a long time, your system can become totally locked until the > timeout happens and the kernel gives up. (If that happens, there SHOULD be > a timeout recorded in the syslog or /var/log/messages. Check there for > timeouts on disk drives or hard resets or such). (I know this because of > where I work J) (Disk drives are supposed to acknowledge the command > almost immediately. It is almost always a bad thing when the drive takes > the command but does not finish the initial command handshake sequence… > You might want to look at the S.M.A.R.T. attributes for your drives as well > to see if any of them are showing ‘pre-fail’ conditions)**** > > ** ** > > Rusty**** > > ** ** > > *From:* plug-discuss-bounces@lists.plug.phoenix.az.us [mailto: > plug-discuss-bounces@lists.plug.phoenix.az.us] *On Behalf Of *Mark > Phillips > *Sent:* Monday, June 25, 2012 10:13 PM > *To:* Main PLUG discussion list > *Subject:* Re: Strange Server Behavior**** > > ** ** > > Right now, the server is not doing anything but sitting there.... > > Tasks: 98 total, 1 running, 97 sleeping, 0 stopped, 0 zombie > Cpu(s): 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, > 0.0%st > Mem: 1033780k total, 217560k used, 816220k free, 6220k buffers > Swap: 2019320k total, 0k used, 2019320k free, 94056k cached > > Plenty of swap, not very busy. It may be over heating, but not sure why. > > I am going to run a test tonight - ping every 10 seconds and time stamp > the output into a file. Perhaps I will see gaps or unusually long response > times and I can correlate that with the log files. > > Mark**** > > On Mon, Jun 25, 2012 at 10:09 PM, JD Austin < > wrote:* > *** > > I've had servers that act like that.. usually they're over heating, > completely I/O bound, or swapping due to low available memory. **** > > ** ** > > On Mon, Jun 25, 2012 at 10:00 PM, Mark Phillips < > mark@phillipsmarketing.biz> wrote:**** > > Nope - everything just stops - ping waits for a response, web services > just wait for the server, file transfers stop and wait.......as if time > just stopped for the server, then starts again without any errors being > evident. > > Mark**** > > ** ** > > On Mon, Jun 25, 2012 at 9:57 PM, Stephen < > wrote:**** > > Can you do access any other services hosted by the server during this > time? Or even an extended ping?**** > > On Jun 25, 2012 9:53 PM, "Mark Phillips" < > wrote:**** > > I have a headless server running Linux version 2.6.32-5-686 (Debian > 2.6.32-45) (dannf@debian.org) (gcc version 4.3.5 (Debian 4.3.5-4) and no > X or window manager, and I have noticed in the past couple of days that > when I ssh in the server it occasionally stops responding for a minute or > two, then comes back as if nothing had happened. It is a random event - > maybe once an hour. I cannot find anything in the logs - no error messages. > There is nothing wrong with the machine where I initiated the ssh session, > and it is not connected to ssh. The server completely stops responding, > then comes back as if nothing had happened. > > How would I go about diagnosing this problem? > > Thanks, > > > **** > > ** ** > > --------------------------------------------------- > PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us > To subscribe, unsubscribe, or to change your mail settings: > http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss >