finding duplicates?

Lynn David Newton plug-discuss@lists.plug.phoenix.az.us
Thu, 28 Feb 2002 12:28:53 -0700


  David> How duplicate?  Names?  Contents?  Name + Contents?

  David> diff has some support for comparing directories.

  David> uniq has a potentially helpful -d flag.

  David> I smell a find coming up.

find alone would be inadequate. I don't intend to write
the script for the inquirer, but assuming that
duplication in content is the intent, md5sum should
suffice to spot the files, so some form of the
following should pretty well handle the guts of what is
needed. Add whistles and bells to taste:

#!/bin/ksh
find "$@" -follow -type f |
  while read f; do md5sum $f; done |
  sort | uniq -w 32 -D

Note that this will find linked files as duplicates.

-- 
Lynn David Newton
Phoenix, AZ