Linux as backup (failover) machine

Trent Shipley tshipley@symbio-tech.com
Sat, 4 Nov 2000 14:57:30 -0700


> -----Original Message-----
> From: plug-discuss-admin@lists.PLUG.phoenix.az.us
> [mailto:plug-discuss-admin@lists.PLUG.phoenix.az.us]On Behalf Of Kevin
> Buettner
> Sent: Friday, November 03, 2000 7:08 PM
> To: plug-discuss@lists.PLUG.phoenix.az.us
> Subject: Re: Linux as backup (failover) machine
>
>
> On Nov 4,  8:11am, Ken Bowley wrote:
>
> > I've been posed with a question, and I'm a little stumped...  please
> > bear with me.
> >
> > Problem:
> > Make a Linux machine automatically kick in as a failover machine for
> > http when the NT machine goes down.
> >
> > Restrictions:
> > Need to be able to monitor the NT box without installing anything
> > extra on the NT machine.  Linux machine needs to be able to kick in
> > automatically when the NT box goes down, and give control back to
> > the NT box when it comes back up.  No access to installing any type
> > of router/proxy between the NT and Linux box and the rest of the
> > net.
> >
> > Please send your ideas either directly to myself, or to the list if
> > this problem is of interest to others.
>
> First, I'm sure that there's some code already out there somewhere
> for this, but it doesn't sound terribly difficult to implement from
> scratch either.  (Maybe about five lines of Perl?)
>
> Anyway, the NT box in pingable, right?
>
> Set up a script which continuously pings the NT box; when the
> pings stop coming back, do an ifconfig on your network interface
> to the NT box's IP address.
>
> The reqlinquishing control part is harder, but could be easily
> solved if the NT machine had two network adapters; you could ping
> the second one to know when to give up the NT machine's IP
> address.
>
> So... thinking about this some more, it'd probably be best if
> both machines had two network cards.  Weird things happen
> when two machines attempt to use the same IP address.
>
> So here's how it'd look:
>
> ====+==+==============+==+========= Network
>     |  |              |  |
>    A| B|             C| D|
>     |  |              |  |
>    -+--+-           --+--+-
>   |  NT  |         | Linux |
>   --------         ---------
>
> Now suppose that NT is supplying its services via interface A and
> that you want Linux to use C when it acts as the failover.
>
> So...  start out with C disabled ("ifconfig eth0 down", or somesuch).
> Ping B via D.  When the pings stop coming back, do "ifconfig eth0 up ..."
> Now, you continue to ping B from D, and when the pings resume, just
> do "ifconfig eth0 down" again to allow the NT machine to take over
> again.
>
> It may be possible to make it work with a single NIC on the NT box,
> but I have doubts about the reliability.  (But someone who knows
> more about networking that I do might have some ideas.)
>
> Note too that you can tighten the whole arrangement up by doing:
>
> ====+=================+============ Network
>     |                 |
>    A|                C|
>     |                 |
>    -+-----         ---+----
>   |  NT  +----~----+ Linux |
>   -------- B     D ---------
>
> where the cable between B and D is a crossover cable.  That way too
> you could assign B and D network addresses intended for private
> networks (192.168.X.Y or 10.X.Y.Z).
>
> Okay, so maybe it's around 25 lines of Perl.  (It sounds interesting
> enough that I'm tempted to code it myself.)
>

If _In Search of Clusters, Second Edition_ by Gregory F. Pfister, is any
indication you are looking at a lot more than 25 lines of code.  Also, since
you are going to want to run the failover monitor on the Linux box as a
background daemon, it brings into question using a scripting language for
the implementation.

Not being able to install a proxy or router between the dual failover boxes
is not much of a limitation.  That is a dead end because it just introduces
another point of failure.

Not being able to alter the primary may make mean that your boss just
ordered miracle-ware.  This is particularly true if the failover has to be
transactionally correct . . . and if the box is mission critical, then the
accountants are going to INSIST that no data be lost or created during the
failover.  (Transactional semantics may mean that the project cannot be done
in-house. . . .)

Failback is just as problematic, though you will get to recycle a lot of
code (but not all of it.  The problems are not identical.)

Unless you can find a canned freeware solution you might want to tell them
to look at buying another NT license (you might get away with a workstation
instead of a server version), two MTS licenses, and a proprietary failover
system.

Also, Oracle has a feature called "standby database" that is standard.  It
probably won't help with your problem, but it might be useful as an example.