NFS Question -- any takers?

Top Page
Attachments:
Message as email
+ (text/plain)
Delete this message
Reply to this message
Author: June Tate
Date:  
Subject: NFS Question -- any takers?
--=-fr4+Z0V+U7Vd9oG5VgzX
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable

Hi all,

I've got a rather interesting NFS problem with a few servers I
administer at work, and was wondering if any of you have run into this
particular issue before. For reference, here's a quick ASCII-gram layout
of what we're running:

[ NFS-Server ]
| |=20
| \------------- [ Client 1 ]

|
\---------------- [ Client 2 ]

Nothing special, really. All three of the servers are running Debian
Stable, and the NFS server is currently running the user space NFS
server. Additionally the two clients are running Apache and mount
/var/www off of the NFS server (the two clients are, in fact, part of an
LVS web cluster and run as the actual web servers).

The problem we have is that every once in a while, we'll start seeing
"Input/Output Error" messages at either one or both of the clients. For
instance:

    client-1:/var/www# ls
    test6/: Input/Output Error
    total 25
    test1/
    test2/
    test3/
    test4/
    test5/
    client-1:/var/www# _


The oddball thing is, when I look at the NFS server side, the files are
there and in perfect order. Additionally, when I look at the client's
NFS logs I see the following message:

    Jan  4 08:50:26 client-1 kernel: nfs_refresh_inode: inode \
    52609047 mode changed, 0120777 to 0100644


And on the server in the daemon.log:

    Jan  4 08:48:00 nfs-server nfsd[16566]: fd cache inconsistency!


We use rsync to get our development files in /var/www from the staging
server in the office to the nfs-server out at the rack, so I figured
that maybe it was being caused by Apache keeping the file open and rsync
overwriting it at the same time.

After doing some hand tests on a couple of boxen in the office, however,
I proved this wasn't the case. Any process can keep a file open, and
when it's overwritten by some other process, the originating process
gets a "Stale NFS Handle" signal and (usually) quits. In the case of
Apache, it just usually lets the child process that opened the file die
off and return a broken data stream. Either way though, the filesystem
remains intact on both client and server.

In the meantime, I've switched from the user-space server to the
kernel-space one in the hopes that this particular problem is
implementation dependent. So far we haven't had this problem crop up
yet, but I think this question should be answered somehow since it seems
that others have run into similar issues before and never had responses.

Also note that switching from NFS to Samba is not an option because
Samba does not honor case sensitivity. I have looked at alternative
network filesystems such as Intermezzo, Coda, and Lustre as well, but
none seem to fit our needs (nor are they particularly well-supported in
Debian).

Any ideas? =3Do)

--=20
June Tate * <> * http://www.theonelab.com

--=-fr4+Z0V+U7Vd9oG5VgzX
Content-Type: application/pgp-signature; name=signature.asc
Content-Description: This is a digitally signed message part

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQBAAJHXe5i+GsTTlpkRAu9ZAKCt+9gFg5xd0XaLlix9CIpt38FJQgCeO+zD
XJWpgJ44kRdn1BeUtpKHHUw=
=CWF8
-----END PGP SIGNATURE-----

--=-fr4+Z0V+U7Vd9oG5VgzX--