Some more pieces to the puzzle... I ran the same ping test last night (laptop to server), but stopped the following services on the server: apache2 exim4 mediatomb mysql nfs-kernel-server nfs-common openvpnas cups ntp rpcbind And there were no packets lost! 2802 packets transmitted, 2802 received, 0% packet loss, time 28010243ms rtt min/avg/max/mdev = 0.063/0.157/0.319/0.033 ms This is all that was running: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.0 2036 736 ? Ss Jun25 0:01 init [2] root 2 0.0 0.0 0 0 ? S Jun25 0:00 [kthreadd] root 3 0.0 0.0 0 0 ? S Jun25 0:00 [migration/0] root 4 0.0 0.0 0 0 ? S Jun25 0:00 [ksoftirqd/0] root 5 0.0 0.0 0 0 ? S Jun25 0:00 [watchdog/0] root 6 0.0 0.0 0 0 ? S Jun25 0:00 [events/0] root 7 0.0 0.0 0 0 ? S Jun25 0:00 [cpuset] root 8 0.0 0.0 0 0 ? S Jun25 0:00 [khelper] root 9 0.0 0.0 0 0 ? S Jun25 0:00 [netns] root 10 0.0 0.0 0 0 ? S Jun25 0:00 [async/mgr] root 11 0.0 0.0 0 0 ? S Jun25 0:00 [pm] root 12 0.0 0.0 0 0 ? S Jun25 0:00 [sync_supers] root 13 0.0 0.0 0 0 ? S Jun25 0:00 [bdi-default] root 14 0.0 0.0 0 0 ? S Jun25 0:00 [kintegrityd/0] root 15 0.0 0.0 0 0 ? S Jun25 0:00 [kblockd/0] root 16 0.0 0.0 0 0 ? S Jun25 0:00 [kacpid] root 17 0.0 0.0 0 0 ? S Jun25 0:00 [kacpi_notify] root 18 0.0 0.0 0 0 ? S Jun25 0:00 [kacpi_hotplug] root 19 0.0 0.0 0 0 ? S Jun25 0:00 [kseriod] root 21 0.0 0.0 0 0 ? S Jun25 0:00 [kondemand/0] root 22 0.0 0.0 0 0 ? S Jun25 0:00 [khungtaskd] root 23 0.0 0.0 0 0 ? S Jun25 0:00 [kswapd0] root 24 0.0 0.0 0 0 ? SN Jun25 0:00 [ksmd] root 25 0.0 0.0 0 0 ? S Jun25 0:00 [aio/0] root 26 0.0 0.0 0 0 ? S Jun25 0:00 [crypto/0] root 154 0.0 0.0 0 0 ? S Jun25 0:00 [ksuspend_usbd] root 155 0.0 0.0 0 0 ? S Jun25 0:00 [khubd] root 157 0.0 0.0 0 0 ? S Jun25 0:00 [ata/0] So, perhaps I don't have a hardware problem, but a software problem? Mark On Tue, Jun 26, 2012 at 9:02 PM, Stephen wrote: > What about pings from the server? Also my paranoia about this would have > me checking the arp tables to see if the ip address is getting > mis-somethinged. Also see if uou can make a task that will wrife to a file > once every qp sex and see if the server is falling asleep or if it is > network related. It may be nic or switch for example. > On Jun 26, 2012 7:42 PM, "Mark Phillips" > wrote: > >> My ping test is not what I expected but still shows a problem. >> >> I setup the test to ping (64 bytes, ttl =64) the problem server every 10 >> seconds from my laptop. Both are plugged in, on the same subnet. The boxes >> are about 5 feet apart. Here are the results: >> >> 3434 packets transmitted, 3307 received, 3% packet loss, time 34336321ms >> rtt min/avg/max/mdev = 0.100/0.231/4.332/0.275 ms >> >> I found 38 instances where the time stamps for the pings "hiccuped" and >> there was a delay. Each hiccup lasted (min/avg/max) 10/44/70 seconds. The >> time between hiccups was (min/avg/max) 1:20/14:29/29:00 minutes. >> >> I grep'd the log files for messages around these 38 incidents, but did >> not find any messages in any of the logs. However this is not a good test, >> so I just trolled the logs and didn't find anything significant. >> >> Do these numbers strike a chord with anyone? >> >> I will check the caps on the MB later this week. >> >> I looked in syslog, and could not find any correlation with the 38 >> hiccups. However, what do these two cron jobs do, since the run quite >> frequently: >> CMD ( cd / && run-parts --report /etc/cron.hourly) >> CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find >> /var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime) -delete) >> >> A search through the log files for "error" does not return anything >> interesting. >> >> Since apache and few other apps were running on the server, I will run >> the ping test again tonight after I kill everything on the server. >> >> Thanks! >> >> Mark >> >> On Tue, Jun 26, 2012 at 9:35 AM, Carruth, Rusty < >> Rusty.Carruth@smartstoragesys.com> wrote: >> >>> Curious how your test turned out.**** >>> >>> ** ** >>> >>> You may also want to run an iostat to a file and see if that correlates >>> to the slow responses.**** >>> >>> ** ** >>> >>> However, that ‘bulging capacitor’ thing others have mentioned sounds >>> like a pretty convincing coincidence, as it were….**** >>> >>> ** ** >>> >>> (I will say that USUALLY I’d agree with JD – Iobound or low RAM (and >>> thus iobound on swap space) are the only things I’ve seen that cause >>> unresponsiveness (never seen an overheat slow it down, usually it just dies >>> suddenly. I probably get fast overheating and not slow increases in heat >>> levels J)**** >>> >>> ** ** >>> >>> OH! WAIT! I just remembered another event – and it WON’T show up in >>> normal performance logs. If ‘you’ send a command to a disk drive, and it >>> goes busy for a long time, your system can become totally locked until the >>> timeout happens and the kernel gives up. (If that happens, there SHOULD be >>> a timeout recorded in the syslog or /var/log/messages. Check there for >>> timeouts on disk drives or hard resets or such). (I know this because of >>> where I work J) (Disk drives are supposed to acknowledge the command >>> almost immediately. It is almost always a bad thing when the drive takes >>> the command but does not finish the initial command handshake sequence… >>> You might want to look at the S.M.A.R.T. attributes for your drives as well >>> to see if any of them are showing ‘pre-fail’ conditions)**** >>> >>> ** ** >>> >>> Rusty**** >>> >>> ** ** >>> >>> *From:* plug-discuss-bounces@lists.plug.phoenix.az.us [mailto: >>> plug-discuss-bounces@lists.plug.phoenix.az.us] *On Behalf Of *Mark >>> Phillips >>> *Sent:* Monday, June 25, 2012 10:13 PM >>> *To:* Main PLUG discussion list >>> *Subject:* Re: Strange Server Behavior**** >>> >>> ** ** >>> >>> Right now, the server is not doing anything but sitting there.... >>> >>> Tasks: 98 total, 1 running, 97 sleeping, 0 stopped, 0 zombie >>> Cpu(s): 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, >>> 0.0%st >>> Mem: 1033780k total, 217560k used, 816220k free, 6220k buffers >>> Swap: 2019320k total, 0k used, 2019320k free, 94056k cached >>> >>> Plenty of swap, not very busy. It may be over heating, but not sure why. >>> >>> I am going to run a test tonight - ping every 10 seconds and time stamp >>> the output into a file. Perhaps I will see gaps or unusually long response >>> times and I can correlate that with the log files. >>> >>> Mark**** >>> >>> On Mon, Jun 25, 2012 at 10:09 PM, JD Austin < > >>> wrote:**** >>> >>> I've had servers that act like that.. usually they're over heating, >>> completely I/O bound, or swapping due to low available memory. **** >>> >>> ** ** >>> >>> On Mon, Jun 25, 2012 at 10:00 PM, Mark Phillips < >>> mark@phillipsmarketing.biz> wrote:**** >>> >>> Nope - everything just stops - ping waits for a response, web services >>> just wait for the server, file transfers stop and wait.......as if time >>> just stopped for the server, then starts again without any errors being >>> evident. >>> >>> Mark**** >>> >>> ** ** >>> >>> On Mon, Jun 25, 2012 at 9:57 PM, Stephen < > wrote:**** >>> >>> Can you do access any other services hosted by the server during this >>> time? Or even an extended ping?**** >>> >>> On Jun 25, 2012 9:53 PM, "Mark Phillips" < > wrote:**** >>> >>> I have a headless server running Linux version 2.6.32-5-686 (Debian >>> 2.6.32-45) (dannf@debian.org) (gcc version 4.3.5 (Debian 4.3.5-4) and >>> no X or window manager, and I have noticed in the past couple of days that >>> when I ssh in the server it occasionally stops responding for a minute or >>> two, then comes back as if nothing had happened. It is a random event - >>> maybe once an hour. I cannot find anything in the logs - no error messages. >>> There is nothing wrong with the machine where I initiated the ssh session, >>> and it is not connected to ssh. The server completely stops responding, >>> then comes back as if nothing had happened. >>> >>> How would I go about diagnosing this problem? >>> >>> Thanks, >>> >>> >>> **** >>> >>> ** ** >>> >>> --------------------------------------------------- >>> PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us >>> To subscribe, unsubscribe, or to change your mail settings: >>> http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss >>> >> >> >> --------------------------------------------------- >> PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us >> To subscribe, unsubscribe, or to change your mail settings: >> http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss >> > > --------------------------------------------------- > PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us > To subscribe, unsubscribe, or to change your mail settings: > http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss >