On Feb 28, 10:33am, J.Francois wrote: > On Thu, Feb 28, 2002 at 12:07:38PM -0500, Mike wrote: > > I am looking for a command I can run on the command line (from cron) > > which finds/searches (recursively) for duplicate files. > > http://www.google.com/search?hl=en&q=linux+find+duplicate+files > http://www.perlmonks.org/index.pl?node_id=2712&lastnode_id=1747 The script below is similar to the solution on the perlmonks page, but is perhaps somewhat simpler: --- find-dups --- #!/usr/bin/perl -w use File::Find; use Digest::MD5 qw(md5_hex); undef $/; # slurp entire files my %h; find( sub { if (! -d && -r) { open F, $File::Find::name or return; push @{$h{md5_hex()}}, $File::Find::name; close F; } }, shift || "." ); while (my ($k, $v) = each %h) { print join("\n ", sort(@$v)), "\n" if @$v > 1; } --- end find-dups --- When I run it in my ptests directory (which is where I keep most of the perl scripts that I write before deploying them to some bin directory), I see the following: ocotillo:ptests$ ./find-dups ./logfile ./logfile2 ./flashcards.pl ./mathproblems.pl I.e, this means that logfile and logfile2 are the same and that flashcards.pl and mathproblems.pl are the same. If I do the following... ocotillo:ptests$ cp find-dups dup1 ocotillo:ptests$ cp find-dups dup2 and then run find-dups again, I see: ocotillo:ptests$ ./find-dups ./logfile ./logfile2 ./dup1 ./dup2 ./find-dups ./flashcards.pl ./mathproblems.pl