What about pings from the server? Also my paranoia about this would have me checking the arp tables to see if the ip address is getting mis-somethinged. Also see if uou can make a task that will wrife to a file once every qp sex and see if the server is falling asleep or if it is network related. It may be nic or switch for example. On Jun 26, 2012 7:42 PM, "Mark Phillips" wrote: > My ping test is not what I expected but still shows a problem. > > I setup the test to ping (64 bytes, ttl =64) the problem server every 10 > seconds from my laptop. Both are plugged in, on the same subnet. The boxes > are about 5 feet apart. Here are the results: > > 3434 packets transmitted, 3307 received, 3% packet loss, time 34336321ms > rtt min/avg/max/mdev = 0.100/0.231/4.332/0.275 ms > > I found 38 instances where the time stamps for the pings "hiccuped" and > there was a delay. Each hiccup lasted (min/avg/max) 10/44/70 seconds. The > time between hiccups was (min/avg/max) 1:20/14:29/29:00 minutes. > > I grep'd the log files for messages around these 38 incidents, but did not > find any messages in any of the logs. However this is not a good test, so I > just trolled the logs and didn't find anything significant. > > Do these numbers strike a chord with anyone? > > I will check the caps on the MB later this week. > > I looked in syslog, and could not find any correlation with the 38 > hiccups. However, what do these two cron jobs do, since the run quite > frequently: > CMD ( cd / && run-parts --report /etc/cron.hourly) > CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find > /var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime) -delete) > > A search through the log files for "error" does not return anything > interesting. > > Since apache and few other apps were running on the server, I will run the > ping test again tonight after I kill everything on the server. > > Thanks! > > Mark > > On Tue, Jun 26, 2012 at 9:35 AM, Carruth, Rusty < > Rusty.Carruth@smartstoragesys.com> wrote: > >> Curious how your test turned out.**** >> >> ** ** >> >> You may also want to run an iostat to a file and see if that correlates >> to the slow responses.**** >> >> ** ** >> >> However, that ‘bulging capacitor’ thing others have mentioned sounds like >> a pretty convincing coincidence, as it were….**** >> >> ** ** >> >> (I will say that USUALLY I’d agree with JD – Iobound or low RAM (and thus >> iobound on swap space) are the only things I’ve seen that cause >> unresponsiveness (never seen an overheat slow it down, usually it just dies >> suddenly. I probably get fast overheating and not slow increases in heat >> levels J)**** >> >> ** ** >> >> OH! WAIT! I just remembered another event – and it WON’T show up in >> normal performance logs. If ‘you’ send a command to a disk drive, and it >> goes busy for a long time, your system can become totally locked until the >> timeout happens and the kernel gives up. (If that happens, there SHOULD be >> a timeout recorded in the syslog or /var/log/messages. Check there for >> timeouts on disk drives or hard resets or such). (I know this because of >> where I work J) (Disk drives are supposed to acknowledge the command >> almost immediately. It is almost always a bad thing when the drive takes >> the command but does not finish the initial command handshake sequence… >> You might want to look at the S.M.A.R.T. attributes for your drives as well >> to see if any of them are showing ‘pre-fail’ conditions)**** >> >> ** ** >> >> Rusty**** >> >> ** ** >> >> *From:* plug-discuss-bounces@lists.plug.phoenix.az.us [mailto: >> plug-discuss-bounces@lists.plug.phoenix.az.us] *On Behalf Of *Mark >> Phillips >> *Sent:* Monday, June 25, 2012 10:13 PM >> *To:* Main PLUG discussion list >> *Subject:* Re: Strange Server Behavior**** >> >> ** ** >> >> Right now, the server is not doing anything but sitting there.... >> >> Tasks: 98 total, 1 running, 97 sleeping, 0 stopped, 0 zombie >> Cpu(s): 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, >> 0.0%st >> Mem: 1033780k total, 217560k used, 816220k free, 6220k buffers >> Swap: 2019320k total, 0k used, 2019320k free, 94056k cached >> >> Plenty of swap, not very busy. It may be over heating, but not sure why. >> >> I am going to run a test tonight - ping every 10 seconds and time stamp >> the output into a file. Perhaps I will see gaps or unusually long response >> times and I can correlate that with the log files. >> >> Mark**** >> >> On Mon, Jun 25, 2012 at 10:09 PM, JD Austin < > wrote: >> **** >> >> I've had servers that act like that.. usually they're over heating, >> completely I/O bound, or swapping due to low available memory. **** >> >> ** ** >> >> On Mon, Jun 25, 2012 at 10:00 PM, Mark Phillips < >> mark@phillipsmarketing.biz> wrote:**** >> >> Nope - everything just stops - ping waits for a response, web services >> just wait for the server, file transfers stop and wait.......as if time >> just stopped for the server, then starts again without any errors being >> evident. >> >> Mark**** >> >> ** ** >> >> On Mon, Jun 25, 2012 at 9:57 PM, Stephen < > wrote:**** >> >> Can you do access any other services hosted by the server during this >> time? Or even an extended ping?**** >> >> On Jun 25, 2012 9:53 PM, "Mark Phillips" < > wrote:**** >> >> I have a headless server running Linux version 2.6.32-5-686 (Debian >> 2.6.32-45) (dannf@debian.org) (gcc version 4.3.5 (Debian 4.3.5-4) and no >> X or window manager, and I have noticed in the past couple of days that >> when I ssh in the server it occasionally stops responding for a minute or >> two, then comes back as if nothing had happened. It is a random event - >> maybe once an hour. I cannot find anything in the logs - no error messages. >> There is nothing wrong with the machine where I initiated the ssh session, >> and it is not connected to ssh. The server completely stops responding, >> then comes back as if nothing had happened. >> >> How would I go about diagnosing this problem? >> >> Thanks, >> >> >> **** >> >> ** ** >> >> --------------------------------------------------- >> PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us >> To subscribe, unsubscribe, or to change your mail settings: >> http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss >> > > > --------------------------------------------------- > PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us > To subscribe, unsubscribe, or to change your mail settings: > http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss >