Clustering VS. Mainframe

Top Page
Attachments:
Message as email
+ (text/plain)
Delete this message
Reply to this message
Author: Eric Lee Green
Date:  
Subject: Clustering VS. Mainframe
On Wednesday 22 January 2003 02:42 pm, David Mandala wrote:
> Hmm, it does work for some storage clusters. Appletalk clusters may have
> a problem but other database are able to cluster and accommodate a dead
> node without the need to reboot the clients. A well designed cluster
> built for a purpose can be designed to avoid choke points. Again
> depending upon the devices and the design it can and should take less
> then 1 second to detect a failed hardware point. Anything at the time
> limits you are describing (3 minutes) is unacceptable performance.
>
> With the correct design the customer, be it web, database or calculation
> cluster never knows a node went down, nor should they.


I'm sorry, but I am not aware of any current cluster designs for Linux that
replicate socket state between the nodes. MOSIX was supposedly working on
one, but as far as I know has never released one. Please correct me if I'm
wrong. The fact of the matter is that for protocols that use a persistent
socket (such as SMB or Appletalk, or databases for that matter), unless
socket state is replicated you cannot have a transparent failover no matter
how much other process state you replicate.

The only folks I'm aware of as having anything close to socket state
replication is the Mission Critical Linux folks, and they actually replicate
NFS connection state for NFS RCP sockets, not the sockets themselves, so that
NFS failover can occur transparently. However, note that a Kimberlite cluster
can be in a failed state for as long as 60 seconds before the slave
detirmines that the master has failed and assumes command of the cluster.

Now, for special purpose applications, you can do your database failover on
the client side (rather than on the server side). That is, if a transaction
fails, you can repeat the transaction using a different database server.
Similarly, for write transactions, you can perform the transaction to
multiple database servers in order to maintain clustered database redundancy.
Your cluster members can then check their state at bootup to make sure that
they have all queued transactions, and can be "caught up" on delayed
transactions at that time (but indicate to the client that they aren't
available yet until the delayed transactions have been replayed). My
understanding is that some of the "name" databases have this support already
built into them. But this is part of the database/application, not something
that can be handled transparently by a cluster. You can modify most any
application to be clustered. But you can't take a non-clustered network
application and have it transparently handle the situation where a cluster
member goes away.

> Sorry a Google cluster is not just a web cluster. A Google cluster
> consists of approximately 80 machines. Some are database slices (their
> database is HUGE), and some are logic and some are web. If any machine
> in the cluster fails the you never see it does not matter if it is a
> database, logic or web machine,


Incorrect. If it is a web machine, you may get an aborted transfer error, or,
rather, the transfer appears to "hang". This occurs. I've encountered it. As
for the rest of a Google cluster, as I mentioned, you can modify most any
application to be clustered. That does not help if you are wanting to use
standard applications that were not designed to be clustered, or that you
have no source code to (such as the SMB protocol stack on Windows, or the
AppleTalk protocol stack on Macs).

> Not all problems lend themselves to clusters, those that do can make use
> of commodity hardware, and Linux and save big bucks compared to a big
> iron machine.


As I said:

> > The operant point is "can be split over a cluster". Not all problems can
> > be split over a cluster (or if they can, it is in a very clunky and
> > unreliable way).


-- 
Eric Lee Green          GnuPG public key at http://badtux.org/eric/eric.gpg
          mailto:eric@badtux.org  Web: http://www.badtux.org