On Wednesday 22 January 2003 02:42 pm, David Mandala wrote:
> Hmm, it does work for some storage clusters. Appletalk clusters may have
> a problem but other database are able to cluster and accommodate a dead
> node without the need to reboot the clients. A well designed cluster
> built for a purpose can be designed to avoid choke points. Again
> depending upon the devices and the design it can and should take less
> then 1 second to detect a failed hardware point. Anything at the time
> limits you are describing (3 minutes) is unacceptable performance.
>
> With the correct design the customer, be it web, database or calculation
> cluster never knows a node went down, nor should they.

I'm sorry, but I am not aware of any current cluster designs for Linux that 
replicate socket state between the nodes. MOSIX was supposedly working on 
one, but as far as I know has never released one. Please correct me if I'm 
wrong. The fact of the matter is that for protocols that use a persistent 
socket (such as SMB or Appletalk, or databases for that matter), unless 
socket state is replicated you cannot have a transparent failover no matter 
how much other process state you replicate. 

The only folks I'm aware of as having anything close to socket state 
replication is the Mission Critical Linux folks, and they actually replicate 
NFS connection state for NFS RCP sockets, not the sockets themselves, so that 
NFS failover can occur transparently. However, note that a Kimberlite cluster 
can be in a failed state for as long as 60 seconds before the slave 
detirmines that the master has failed and assumes command of the cluster. 

Now, for special purpose applications, you can do your database failover on 
the client side (rather than on the server side). That is, if a transaction 
fails, you can repeat the transaction using a different database server. 
Similarly, for write transactions, you can perform the transaction to 
multiple database servers in order to maintain clustered database redundancy. 
Your cluster members can then check their state at bootup to make sure that 
they have all queued transactions, and can be "caught up" on delayed 
transactions at that time (but indicate to the client that they aren't 
available yet until the delayed transactions have been replayed). My 
understanding is that some of the "name" databases have this support already 
built into them. But this is part of the database/application, not something 
that can be handled transparently by a cluster. You can modify most any 
application to be clustered. But you can't take a non-clustered network 
application and have it transparently handle the situation where a cluster 
member goes away. 

> Sorry a Google cluster is not just a web cluster. A Google cluster
> consists of approximately 80 machines. Some are database slices (their
> database is HUGE), and some are logic and some are web. If any machine
> in the cluster fails the you never see it does not matter if it is a
> database, logic or web machine, 

Incorrect. If it is a web machine, you may get an aborted transfer error, or, 
rather, the transfer appears to "hang". This occurs. I've encountered it. As 
for the rest of a Google cluster, as I mentioned, you can modify most any 
application to be clustered. That does not help if you are wanting to use 
standard applications that were not designed to be clustered, or that you 
have no source code to (such as the SMB protocol stack on Windows, or the 
AppleTalk protocol stack on Macs). 

> Not all problems lend themselves to clusters, those that do can make use
> of commodity hardware, and Linux and save big bucks compared to a big
> iron machine.

As I said:

> > The operant point is "can be split over a cluster". Not all problems can
> > be split over a cluster (or if they can, it is in a very clunky and
> > unreliable way). 

-- 
Eric Lee Green          GnuPG public key at http://badtux.org/eric/eric.gpg
          mailto:eric@badtux.org  Web: http://www.badtux.org