finding duplicates?

David A. Sinck plug-discuss@lists.plug.phoenix.az.us
Thu, 28 Feb 2002 12:41:09 -0700


\_ SMTP quoth Lynn David Newton on 2/28/2002 12:28 as having spake thusly:
\_
\_ [...]
\_   David> I smell a find coming up.
\_ 
\_ find alone would be inadequate. 

Heh, yeah.  Amazingly, find doesn't have a --duplicate-files option
:-)

\_ [...]
\_ 
\_ #!/bin/ksh
\_ find "$@" -follow -type f |
\_   while read f; do md5sum $f; done |
\_   sort | uniq -w 32 -D

I think perhaps running md5sum on every file might be a bit of a CPU
heater.  If I were inclined to be nice to the CPU, I'd check size then
md5sum if the same...unless you're coffee's cold or you have cycles to
burn.  :-)

David