Some more pieces to the puzzle...
I ran the same ping test last night (laptop to server), but stopped the
following services on the server:
apache2
exim4
mediatomb
mysql
nfs-kernel-server
nfs-common
openvpnas
cups
ntp
rpcbind
And there were no packets lost!
2802 packets transmitted, 2802 received, 0% packet loss, time 28010243ms
rtt min/avg/max/mdev = 0.063/0.157/0.319/0.033 ms
This is all that was running:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 2036 736 ? Ss Jun25 0:01 init [2]
root 2 0.0 0.0 0 0 ? S Jun25 0:00 [kthreadd]
root 3 0.0 0.0 0 0 ? S Jun25 0:00
[migration/0]
root 4 0.0 0.0 0 0 ? S Jun25 0:00
[ksoftirqd/0]
root 5 0.0 0.0 0 0 ? S Jun25 0:00
[watchdog/0]
root 6 0.0 0.0 0 0 ? S Jun25 0:00 [events/0]
root 7 0.0 0.0 0 0 ? S Jun25 0:00 [cpuset]
root 8 0.0 0.0 0 0 ? S Jun25 0:00 [khelper]
root 9 0.0 0.0 0 0 ? S Jun25 0:00 [netns]
root 10 0.0 0.0 0 0 ? S Jun25 0:00 [async/mgr]
root 11 0.0 0.0 0 0 ? S Jun25 0:00 [pm]
root 12 0.0 0.0 0 0 ? S Jun25 0:00
[sync_supers]
root 13 0.0 0.0 0 0 ? S Jun25 0:00
[bdi-default]
root 14 0.0 0.0 0 0 ? S Jun25 0:00
[kintegrityd/0]
root 15 0.0 0.0 0 0 ? S Jun25 0:00 [kblockd/0]
root 16 0.0 0.0 0 0 ? S Jun25 0:00 [kacpid]
root 17 0.0 0.0 0 0 ? S Jun25 0:00
[kacpi_notify]
root 18 0.0 0.0 0 0 ? S Jun25 0:00
[kacpi_hotplug]
root 19 0.0 0.0 0 0 ? S Jun25 0:00 [kseriod]
root 21 0.0 0.0 0 0 ? S Jun25 0:00
[kondemand/0]
root 22 0.0 0.0 0 0 ? S Jun25 0:00
[khungtaskd]
root 23 0.0 0.0 0 0 ? S Jun25 0:00 [kswapd0]
root 24 0.0 0.0 0 0 ? SN Jun25 0:00 [ksmd]
root 25 0.0 0.0 0 0 ? S Jun25 0:00 [aio/0]
root 26 0.0 0.0 0 0 ? S Jun25 0:00 [crypto/0]
root 154 0.0 0.0 0 0 ? S Jun25 0:00
[ksuspend_usbd]
root 155 0.0 0.0 0 0 ? S Jun25 0:00 [khubd]
root 157 0.0 0.0 0 0 ? S Jun25 0:00 [ata/0]
So, perhaps I don't have a hardware problem, but a software problem?
Mark
On Tue, Jun 26, 2012 at 9:02 PM, Stephen <
cryptworks@gmail.com> wrote:
> What about pings from the server? Also my paranoia about this would have
> me checking the arp tables to see if the ip address is getting
> mis-somethinged. Also see if uou can make a task that will wrife to a file
> once every qp sex and see if the server is falling asleep or if it is
> network related. It may be nic or switch for example.
> On Jun 26, 2012 7:42 PM, "Mark Phillips" <mark@phillipsmarketing.biz>
> wrote:
>
>> My ping test is not what I expected but still shows a problem.
>>
>> I setup the test to ping (64 bytes, ttl =64) the problem server every 10
>> seconds from my laptop. Both are plugged in, on the same subnet. The boxes
>> are about 5 feet apart. Here are the results:
>>
>> 3434 packets transmitted, 3307 received, 3% packet loss, time 34336321ms
>> rtt min/avg/max/mdev = 0.100/0.231/4.332/0.275 ms
>>
>> I found 38 instances where the time stamps for the pings "hiccuped" and
>> there was a delay. Each hiccup lasted (min/avg/max) 10/44/70 seconds. The
>> time between hiccups was (min/avg/max) 1:20/14:29/29:00 minutes.
>>
>> I grep'd the log files for messages around these 38 incidents, but did
>> not find any messages in any of the logs. However this is not a good test,
>> so I just trolled the logs and didn't find anything significant.
>>
>> Do these numbers strike a chord with anyone?
>>
>> I will check the caps on the MB later this week.
>>
>> I looked in syslog, and could not find any correlation with the 38
>> hiccups. However, what do these two cron jobs do, since the run quite
>> frequently:
>> CMD ( cd / && run-parts --report /etc/cron.hourly)
>> CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find
>> /var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime) -delete)
>>
>> A search through the log files for "error" does not return anything
>> interesting.
>>
>> Since apache and few other apps were running on the server, I will run
>> the ping test again tonight after I kill everything on the server.
>>
>> Thanks!
>>
>> Mark
>>
>> On Tue, Jun 26, 2012 at 9:35 AM, Carruth, Rusty <
>> Rusty.Carruth@smartstoragesys.com> wrote:
>>
>>> Curious how your test turned out.****
>>>
>>> ** **
>>>
>>> You may also want to run an iostat to a file and see if that correlates
>>> to the slow responses.****
>>>
>>> ** **
>>>
>>> However, that ‘bulging capacitor’ thing others have mentioned sounds
>>> like a pretty convincing coincidence, as it were….****
>>>
>>> ** **
>>>
>>> (I will say that USUALLY I’d agree with JD – Iobound or low RAM (and
>>> thus iobound on swap space) are the only things I’ve seen that cause
>>> unresponsiveness (never seen an overheat slow it down, usually it just dies
>>> suddenly. I probably get fast overheating and not slow increases in heat
>>> levels J)****
>>>
>>> ** **
>>>
>>> OH! WAIT! I just remembered another event – and it WON’T show up in
>>> normal performance logs. If ‘you’ send a command to a disk drive, and it
>>> goes busy for a long time, your system can become totally locked until the
>>> timeout happens and the kernel gives up. (If that happens, there SHOULD be
>>> a timeout recorded in the syslog or /var/log/messages. Check there for
>>> timeouts on disk drives or hard resets or such). (I know this because of
>>> where I work J) (Disk drives are supposed to acknowledge the command
>>> almost immediately. It is almost always a bad thing when the drive takes
>>> the command but does not finish the initial command handshake sequence…
>>> You might want to look at the S.M.A.R.T. attributes for your drives as well
>>> to see if any of them are showing ‘pre-fail’ conditions)****
>>>
>>> ** **
>>>
>>> Rusty****
>>>
>>> ** **
>>>
>>> *From:* plug-discuss-bounces@lists.plug.phoenix.az.us [mailto:
>>> plug-discuss-bounces@lists.plug.phoenix.az.us] *On Behalf Of *Mark
>>> Phillips
>>> *Sent:* Monday, June 25, 2012 10:13 PM
>>> *To:* Main PLUG discussion list
>>> *Subject:* Re: Strange Server Behavior****
>>>
>>> ** **
>>>
>>> Right now, the server is not doing anything but sitting there....
>>>
>>> Tasks: 98 total, 1 running, 97 sleeping, 0 stopped, 0 zombie
>>> Cpu(s): 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si,
>>> 0.0%st
>>> Mem: 1033780k total, 217560k used, 816220k free, 6220k buffers
>>> Swap: 2019320k total, 0k used, 2019320k free, 94056k cached
>>>
>>> Plenty of swap, not very busy. It may be over heating, but not sure why.
>>>
>>> I am going to run a test tonight - ping every 10 seconds and time stamp
>>> the output into a file. Perhaps I will see gaps or unusually long response
>>> times and I can correlate that with the log files.
>>>
>>> Mark****
>>>
>>> On Mon, Jun 25, 2012 at 10:09 PM, JD Austin < <jd@twingeckos.com>>
>>> wrote:****
>>>
>>> I've had servers that act like that.. usually they're over heating,
>>> completely I/O bound, or swapping due to low available memory. ****
>>>
>>> ** **
>>>
>>> On Mon, Jun 25, 2012 at 10:00 PM, Mark Phillips <
>>> mark@phillipsmarketing.biz> wrote:****
>>>
>>> Nope - everything just stops - ping waits for a response, web services
>>> just wait for the server, file transfers stop and wait.......as if time
>>> just stopped for the server, then starts again without any errors being
>>> evident.
>>>
>>> Mark****
>>>
>>> ** **
>>>
>>> On Mon, Jun 25, 2012 at 9:57 PM, Stephen < > wrote:****
>>>
>>> Can you do access any other services hosted by the server during this
>>> time? Or even an extended ping?****
>>>
>>> On Jun 25, 2012 9:53 PM, "Mark Phillips" < > wrote:****
>>>
>>> I have a headless server running Linux version 2.6.32-5-686 (Debian
>>> 2.6.32-45) (dannf@debian.org) (gcc version 4.3.5 (Debian 4.3.5-4) and
>>> no X or window manager, and I have noticed in the past couple of days that
>>> when I ssh in the server it occasionally stops responding for a minute or
>>> two, then comes back as if nothing had happened. It is a random event -
>>> maybe once an hour. I cannot find anything in the logs - no error messages.
>>> There is nothing wrong with the machine where I initiated the ssh session,
>>> and it is not connected to ssh. The server completely stops responding,
>>> then comes back as if nothing had happened.
>>>
>>> How would I go about diagnosing this problem?
>>>
>>> Thanks,
>>>
>>>
>>> ****
>>>
>>> ** **
>>>
>>> ---------------------------------------------------
>>> PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
>>> To subscribe, unsubscribe, or to change your mail settings:
>>> http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
>>>
>>
>>
>> ---------------------------------------------------
>> PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
>> To subscribe, unsubscribe, or to change your mail settings:
>> http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
>>
>
> ---------------------------------------------------
> PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
> To subscribe, unsubscribe, or to change your mail settings:
> http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
>
---------------------------------------------------
PLUG-discuss mailing list -
PLUG-discuss@lists.plug.phoenix.az.us
To subscribe, unsubscribe, or to change your mail settings:
http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss