isp's

Sun, 28 May 2000 21:43:02 -0700

On May 29,  2:21pm, phil wrote:

> I'm not very knowledable about networking/internet type things.  How
> would I go about testing for packet loss?

Your first clue that there's a problem is that things like http
accesses, telnet sessions, ssh sessions, etc. will inexplicably hang
for a short (or long) periods of time and then resume again (or not). 
Once that happens, you need to determine just how close the problem is
to you.  I.e, if it's between you and your ISP or in some route
through your ISP, you can call your ISP and have them do something
about it.  If the problem is beyond that point, chances are good that
a whole lot of people already know about it and will do something
automatically (for both you and themselves).  And, of course, it's
possible that the problem is due to mis-configuration on your end,
in which case it's your responsibility to fix it.

In order to diagnose such a problem, tools like traceroute and ping
are your friends.  First you do a traceroute (if you can).  E.g, 

redrock:kev$ /usr/sbin/traceroute primenet.com
traceroute: Warning: primenet.com has multiple addresses; using 206.165.6.207
traceroute to primenet.com (206.165.6.207), 30 hops max, 40 byte packets
 1  pipeline (192.168.200.100)  6.037 ms  6.133 ms  5.666 ms
 2  is06.phx.gblx.net (206.165.11.206)  33.063 ms  27.139 ms  37.184 ms
 3  fe2-0.cr3.PHX.gblx.net (206.165.11.254)  30.709 ms  29.241 ms  28.518 ms
 4  fe8-1-0.cr1.PHX.gblx.net (206.165.6.113)  29.633 ms  27.950 ms  27.802 ms
 5  usr07.primenet.com (206.165.6.207)  30.717 ms  30.100 ms  29.600 ms

I chose to do my traceroute to one of the user machines at primenet
because that's where my ssh session was hanging.  Also, fetchmail was
connecting but getting hung prior to transferring any messages.

The first machine on the list (pipeline) is my ISDN router.  It's actually
a Netgear RT328, but I used to have an Ascend Pipeline P50 there and have
been too lazy to update my nameserver.  (Yes, I know it's easy to change
the name, but it doesn't really bother me.)

My next step was to ping pipeline:

redrock:kev$ ping 192.168.200.100
PING 192.168.200.100 (192.168.200.100): 56 data bytes
64 bytes from 192.168.200.100: icmp_seq=0 ttl=59 time=4.8 ms
64 bytes from 192.168.200.100: icmp_seq=1 ttl=59 time=4.5 ms
64 bytes from 192.168.200.100: icmp_seq=2 ttl=59 time=3.7 ms

--- 192.168.200.100 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 3.7/4.3/4.8 ms

Next, I ping the next machine in the route, which in this case was
is06.phx.gblx.net:

redrock:kev$ 206.165.11.206
PING primenet.com (206.165.11.206): 56 data bytes
64 bytes from 206.165.11.206: icmp_seq=0 ttl=251 time=111.2 ms
64 bytes from 206.165.11.206: icmp_seq=1 ttl=251 time=33.1 ms
64 bytes from 206.165.11.206: icmp_seq=2 ttl=251 time=33.0 ms
64 bytes from 206.165.11.206: icmp_seq=3 ttl=251 time=31.9 ms
64 bytes from 206.165.11.206: icmp_seq=4 ttl=251 time=33.6 ms
64 bytes from 206.165.11.206: icmp_seq=5 ttl=251 time=32.8 ms
64 bytes from 206.165.11.206: icmp_seq=6 ttl=251 time=33.4 ms
64 bytes from 206.165.11.206: icmp_seq=7 ttl=251 time=32.1 ms
64 bytes from 206.165.11.206: icmp_seq=8 ttl=251 time=32.9 ms
64 bytes from 206.165.11.206: icmp_seq=9 ttl=251 time=32.9 ms
64 bytes from 206.165.11.206: icmp_seq=10 ttl=251 time=102.1 ms
64 bytes from 206.165.11.206: icmp_seq=16 ttl=251 time=33.3 ms
64 bytes from 206.165.11.206: icmp_seq=17 ttl=251 time=32.8 ms
64 bytes from 206.165.11.206: icmp_seq=18 ttl=251 time=32.7 ms
64 bytes from 206.165.11.206: icmp_seq=19 ttl=251 time=32.1 ms
64 bytes from 206.165.11.206: icmp_seq=20 ttl=251 time=86.8 ms
64 bytes from 206.165.11.206: icmp_seq=21 ttl=251 time=34.8 ms
64 bytes from 206.165.11.206: icmp_seq=22 ttl=251 time=35.6 ms
64 bytes from 206.165.11.206: icmp_seq=23 ttl=251 time=32.1 ms
64 bytes from 206.165.11.206: icmp_seq=24 ttl=251 time=33.1 ms
64 bytes from 206.165.11.206: icmp_seq=25 ttl=251 time=32.9 ms
[...]

--- primenet.com ping statistics ---
166 packets transmitted, 101 packets received, 39% packet loss
round-trip min/avg/max = 31.4/42.5/150.6 ms

The final statistics tell you the packet loss.  In this case, it was
39%.  (I got quite a variety of results though ranging from 19% to
greater than 50%.)

After primenet got the problem resolved, I was able to ping a
primenet machine continuously *for several hours* with *no* packet
loss.

If you are having DNS problems, or if you're seeing extremely severe
packet loss, you will want to use the -n switch with ping and with
traceroute in order to avoid nameserver lookups.  (Otherwise, these
utilities will simply hang before they get very far at all.)

There are times when you won't be able to ping a remote machine
because the site (or some machine along the way) blocks the ICMP
protocol packets needed to do the ping.   The reason for this is
that malformed ping packets have been used for a variety of denial
of service attacks.  (At one time Linux was susceptible to some of
these attacks, but once they became known the problem was quickly
fixed.)  Many times though, traceroute will still work - at least
partially - so that you'll be able to see (some of) the machines
along the route.  And you'll frequently be able to ping these
intermediate machines too for the simple reason that network
administrators need to use ping to diagnose network problems.

Kevin