How to locate all duplicate files?

Kevin Fries kfries6 at gmail.com
Wed Apr 21 14:48:47 MST 2010


OK, you have several questions...

- First a simple script to find all duplicate filenames.
problem is you need to get a list of all files on your system, then compare
the names, minus the path.  So I would try something like this (not fully
tested):

#/bin/bash

find -P / -type f > /tmp/files.txt
sed -i -e 's#.*/\(.*\)$#\1#' /tmp/files.txt
sort /tmp/files{,1}.txt
rm files.txt
uniq -D /tmp/files{1,}
rm files1.txt

My logic:
  First get a list of all files ignoring symlinks (which are duplicate by
definition) looking at only regular files.
  Next strip the path from the names in the temp file
  Now that you only have filenames, sort the list into a temp file
  Delete the original file
  Now, seek all duplicates, and place those names back into the original
file
  Delete the second temp file

Now you should have a list of all dup filenames

- How can I tell if they are just duplicate filenames, or if they are
actually duplicate files?
for each filename, find all copies of the files with the find command, and
run them through sha1sum like so:

for x in $(find /tmp -name <filename to check>); do sha1sum $x; done

files with the same sha1sum, should have duplicate contents.

You may need to check my syntax on some of this, but it should get the job
done.

Kevin Fries
On Wed, Apr 21, 2010 at 1:53 PM, <joe at actionline.com> wrote:

>
> What command syntax can I use to locate all duplicate files (filenames) on
> my system?  Or, more specifically, within any specified directory on the
> system?
>
> Also, how can I tell which duplicates have identical contents and which
> duplicates have different content (or at least different file sizes)?
>
>
>
> ---------------------------------------------------
> PLUG-discuss mailing list - PLUG-discuss at lists.plug.phoenix.az.us
> To subscribe, unsubscribe, or to change your mail settings:
> http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.PLUG.phoenix.az.us/pipermail/plug-discuss/attachments/20100421/062e4c9d/attachment.htm>


More information about the PLUG-discuss mailing list