On Jun 23, 7:14pm, Eric Thelin wrote: > I ran your code on a few different directories and compared it with the > find based output and it was the same on ext2 partitions. But on my smbfs > mounted drive that I was checking on it was not descending past the > called directory??? This would seem to be an issue in the File::Find > module. So I have made yet another version. See below. Interesting! Yesterday, I was having trouble with another File::Find script which produces a flat listing of all of the files in a directory structure. (The list resembles that of what you get from "tar tvf".) Normally this program works great. But I was trying it on a mounted cdrom (the CD from O'Reilly's PalmPilot book) and it would only list the topmost level. I just took a look at /usr/lib/perl5/5.00503/File/Find.pm and have found a fix (of a sort). Here's a patch: --- Find.pm.orig Fri Jun 23 22:24:22 2000 +++ Find.pm Fri Jun 23 22:26:12 2000 @@ -217,7 +217,8 @@ if ($^O eq 'VMS') { } $dont_use_nlink = 1 - if $^O eq 'os2' || $^O eq 'dos' || $^O eq 'amigaos' || $^O eq 'MSWin32'; + if $^O eq 'os2' || $^O eq 'dos' || $^O eq 'amigaos' || $^O eq 'MSWin32' + || $^O eq 'linux'; # Set dont_use_nlink in your hint file if your system's stat doesn't # report the number of links in a directory as an indication If you don't want to use this patch, you can just put the following in your scripts which use File::Find: use File::Find; $File::Find::dont_use_nlink = 1; After making this change, however, it wasn't clear to me whether the deficiency was with File::Find or with the kernel. After all, the implementors of File::Find were just trying to optimize it so it would run as fast as possible. Perhaps the implementation of stat() is kind of flakey for certain types of filesystems? The key field for this line of inquiry is st_nlink. This field is the number of hard links to the file's inode. When this value goes to zero, it is safe to delete the inode. For directories, st_nlink will (should) be the number of subdirectories + 2. I've done a few experiments and this appears to be the case for ext2. The first thing to notice is that struct stat (in include/asm-i386/stat.h) declares st_nlink as an unsigned short. This means that you can have a maximum of 65533 subdirectories in a given directory. Actually, the limit is even lower for ext2 since the following appears in include/linux/ext2_fs.h: #define EXT2_LINK_MAX 32000 And, if you look in ext2_mkdir() in fs/ext2/namei.c, there's a check which kicks you out with an error if EXT2_LINK_MAX is exceeded. I notice that the alpha and the IA-64 are sensible and define st_nlink as an unsigned int which means that linux running on these systems will be able to have a much more sensible limit on the number of subdirectories. But I digress... The next thing to do is to look at how the st_nlink field gets set when doing a stat() call. The following line is from cp_new_stat() in fs/stat.c: tmp.st_nlink = inode->i_nlink; I've done some more checking and it turns out that the i_nlink field is declared as an nlink_t which is also an unsigned short. (At least it is for i386; there are other definitions for other architectures, but many of them are unsigned short.) So the limit on the number of links even exists within struct inode. If you grep the sources for i_nlink, you'll find the following comment in isofs_read_inode in fs/isofs/inode.c: inode->i_nlink = 1; /* Set to 1. We know there are 2, but the find utility tries to optimize if it is 2, and it screws up. It is easier to give 1 which tells find to do it the hard way. */ So this explains what's going on for me on my mounted CDROM filesystem, but it doesn't explain your SMB problem. I've stared at fs/smbfs/inode.c, but nothing quite so obvious jumps out at me. (Part of samba is implemented in user space, right? Perhaps the answer is there.) Could you cd to your SMB filesystem and print out the st_nlink value from stat? You can use perl to do it for you. E.g, ocotillo:kev$ cd /mnt/cdrom ocotillo:cdrom$ perl -e 'print [stat(".")]->[3], "\n"' 1 (Note that this value matches the code that I found in the kernel sources.) If your problematic smbfs filesystem produces the same results, we can optimize Find.pm so that it executes the dont_use_nlink path when st_nlink is 1 (for directories). Long term, it'd be nice to fix all of the filesystem implementations in the kernel so that they all set st_nlink to a reasonable value, but that looks like a bit of work. (And even if we were able to get it done, there'd be a lot of "legacy" systems which'd still need the Find.pm patch.) So the bottom line is we're going to need a Find.pm patch anyway. The patch above is a reasonable workaround, but I think we can do better. Kevin