--=-fr4+Z0V+U7Vd9oG5VgzX Content-Type: text/plain Content-Transfer-Encoding: quoted-printable Hi all, I've got a rather interesting NFS problem with a few servers I administer at work, and was wondering if any of you have run into this particular issue before. For reference, here's a quick ASCII-gram layout of what we're running: [ NFS-Server ] | |=20 | \------------- [ Client 1 ] | \---------------- [ Client 2 ] Nothing special, really. All three of the servers are running Debian Stable, and the NFS server is currently running the user space NFS server. Additionally the two clients are running Apache and mount /var/www off of the NFS server (the two clients are, in fact, part of an LVS web cluster and run as the actual web servers). The problem we have is that every once in a while, we'll start seeing "Input/Output Error" messages at either one or both of the clients. For instance: client-1:/var/www# ls test6/: Input/Output Error total 25 test1/ test2/ test3/ test4/ test5/ client-1:/var/www# _ The oddball thing is, when I look at the NFS server side, the files are there and in perfect order. Additionally, when I look at the client's NFS logs I see the following message: Jan 4 08:50:26 client-1 kernel: nfs_refresh_inode: inode \ 52609047 mode changed, 0120777 to 0100644 And on the server in the daemon.log: Jan 4 08:48:00 nfs-server nfsd[16566]: fd cache inconsistency! We use rsync to get our development files in /var/www from the staging server in the office to the nfs-server out at the rack, so I figured that maybe it was being caused by Apache keeping the file open and rsync overwriting it at the same time. After doing some hand tests on a couple of boxen in the office, however, I proved this wasn't the case. Any process can keep a file open, and when it's overwritten by some other process, the originating process gets a "Stale NFS Handle" signal and (usually) quits. In the case of Apache, it just usually lets the child process that opened the file die off and return a broken data stream. Either way though, the filesystem remains intact on both client and server. In the meantime, I've switched from the user-space server to the kernel-space one in the hopes that this particular problem is implementation dependent. So far we haven't had this problem crop up yet, but I think this question should be answered somehow since it seems that others have run into similar issues before and never had responses. Also note that switching from NFS to Samba is not an option because Samba does not honor case sensitivity. I have looked at alternative network filesystems such as Intermezzo, Coda, and Lustre as well, but none seem to fit our needs (nor are they particularly well-supported in Debian). Any ideas? =3Do) --=20 June Tate * * http://www.theonelab.com --=-fr4+Z0V+U7Vd9oG5VgzX Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQBAAJHXe5i+GsTTlpkRAu9ZAKCt+9gFg5xd0XaLlix9CIpt38FJQgCeO+zD XJWpgJ44kRdn1BeUtpKHHUw= =CWF8 -----END PGP SIGNATURE----- --=-fr4+Z0V+U7Vd9oG5VgzX--