OK, how do I count files in a directory QUICK!!!

Joseph Sinclair plug-discussion at stcaz.net
Fri Jul 2 18:35:01 MST 2010


I suspect you are going to find this task to be extremely difficult, if it's possible at all on the hardware you have.

There are two sources of the slowness:
1) You have to read the directory file, with 1M files, that's probably on the order of 200+M of data to actually read from disk.
2) ls is piping all that data into another program, so you're paying for the I/O twice (although the second time it's 1000 times faster).

With this approach, you'll have so much disk I/O that you may find there's nothing left to actually read and write files...

The fastest possible way is to use opendir and readdir to read through the directory entries and count them, you still have to read the directory file (there's no way around that), but at least you're not doing anything more.

The following C++ Code snippet is probably not entirely correct, but it should be close.

#include <sys/types.h>
#include <dirent.h>

...

const int countFiles(const char* targetDirectory, long& fileCount) const
{
  int result = 0;
  fileCount = 0L;
  DIR* dirp = opendir(targetDirectory);
  struct dirent* dirEntry;
  if(dirp != NULL)
  {
    while(dirp != NULL)
    {
      errno = 0;
      if((dirEntry = readdir(dirp)) != NULL)
      {
        // Only count files, not subdirectories, etc...
        if(dirEntry->d_type == DT_REG)
        {
          fileCount++;
        }
      }
      else
      {
        if(errno == 0)
        {
          result = NOT_FOUND;
        }
        else
        {
          result = READ_ERROR;
        }
        dirp = NULL;
      }
    }
    closedir(dirp);
    dirp = NULL;
  }
  else
  {
    result = OPEN_ERROR;
  }
  return result;
}

kitepilot at kitepilot.com wrote:
> NAHW, that's easy:
> ls|wc -l
> Or I could, (couldn't I)
> find . -type f|wc -l
> There are other tricks, like:
> du -a|wc -l
> Etc, etc, etc...
> There are also implications of depth, whether I want only files, and on,
> and on, and on...
> I'll keep it easy:
> It is a directory that only contains files.
> So, you'd say:
> what's wrong with "ls|wc -l" ?
> Well, here is the catch:
> There are almost a million files in that directory.
> And it gets worse:
> This count has to be placed in a loop in a shell script to report a
> second-to-second delta.
> The truth is that find takes some 3 seconds to do the count.
> What about ls without sorting?
> That was almost 15 seconds.
> Now, directories are files.
> It would be great if I could "count lines" on that file or somehow
> interrogate it "how many lines do you have?" without actually hitting
> the filesystem for the count.
> I'm considering writing a little C utility to do just that, but...
> "struct stat" (my first shot) doesn't contain that information either.
> Finally, the question is:
> Is there a utility that would tell me QUICK a file count under a
> directory (regardless of type)?
> And if not, are there C/C++ system calls that would tell me that?
> Thanks everyone!   :)
> Enrique A. Troconis
> ---------------------------------------------------
> PLUG-discuss mailing list - PLUG-discuss at lists.plug.phoenix.az.us
> To subscribe, unsubscribe, or to change your mail settings:
> http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://lists.PLUG.phoenix.az.us/pipermail/plug-discuss/attachments/20100702/b1ee337c/attachment.pgp>


More information about the PLUG-discuss mailing list