<div dir="ltr">thank you so much! After running it I find it only finds the duplicates in ~. I need to find the duplicates across all the directories under home. after looking at the man file and searching for recu it seems it recurses by default unless I am reading it wrong. <div>I tried the uniq command but:<div><br></div><div> uniq -c -d -w list.of.files<br> uniq: list.of.files: invalid number of bytes to compare</div><div><br></div><div>isn't uniq used to find the differences between two files? I have a very rudimentary understanding of linux so I'm sure I'm wrong</div><div><br></div><div>all the files in list.of.files are invisible files. (prefaced with a period))<br>and isn't there a way to sort things depending on their column (column1 md5sum, column2 file name)</div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Sep 30, 2024 at 2:56 AM Rusty Carruth via PLUG-discuss <<a href="mailto:plug-discuss@lists.phxlinux.org">plug-discuss@lists.phxlinux.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>
On 9/28/24 21:06, Michael via PLUG-discuss wrote:<br>
> About a year ago I messed up by accidently copying a folder with other<br>
> folders into another folder. I'm running out of room and need to find that<br>
> directory tree and get rid of it. All I know for certain is that it is<br>
> somewhere in my home directory. I THINK it is my pictures directory with<br>
> ARW files.<br>
> chatgpt told me to use fdupes but it told me to use an exclude option<br>
> (which I found out it doesn't have) to avoid config files (and I was<br>
> planning on adding to that as I discovered other stuff I didn't want). then<br>
> it told me to use find but I got an error which leads me to believe it<br>
> doesn't know what it's talking about!<br>
> coul;d someone help me out?<br>
><br>
First, someone said you need to run updatedb before running find. No, <br>
sorry, updatedb is for using locate, not find. Find actively walks the <br>
directory tree. Locate searches the text (I think) database built by <br>
updatedb.<br>
<br>
<br>
Ok, now to answer the question. I've got a similar situation, but in <br>
spades. Every time I did a backup, I did an entire copy of everything, <br>
so I've got ... oh, 10, 20, 30 copies of many things. I'm working on <br>
scripts to help reduce that, but for now doing it somewhat manually, I <br>
suggest the following command:<br>
<br>
<br>
cd (the directory of interest, possibly your home dir) ; find . -type f <br>
-print0 | xargs -0 md5sum | sort > list.of.files<br>
<br>
this will create a list of files, sorted by their md5sum. If you want <br>
to be lazy and not search that file for duplicate md5sums, consider <br>
uniq. Like this:<br>
<br>
uniq -c -d -w list.of.files<br>
<br>
<br>
This will print the list of files which are duplicates. For example, <br>
out of a list of 42,279 files in a certain directory on my computer, <br>
here's the result:<br>
<br>
2 73d249df037f6e63022e5cfa8d0c959b <br>
_files/from_ebay_pc/pics_and_such_from_work/phone_backup/try2_nonptp_or_whatever/Pictures/Screenshots/Screenshot_20160321-223138.png<br>
5 9b162ac35214691461cc0f0104fb91ce <br>
_files/melissa/Documents/EPHESUS/Office Stuff/SPD/SPD SUMMER 2016 (1).pdf<br>
3 b396af67f2cd75658397efd878a01fb8 <br>
_files/dads_zipdisks/2003-1/CLASS at VBC Sp-03/CLASS BKUP - Music <br>
Reading & Sight Singing Class/C & D Major & Minor Scales & Chords.mct<br>
2 cd83094e0c4aeb9128806b5168444578 <br>
_files/from_ebay_pc/pics_and_such_from_work/phone_backup/try2_nonptp_or_whatever/Pictures/Screenshots/Screenshot_20160318-222051.png<br>
2 d1a5a1bec046cc85a3a3fd53a8d5be86 <br>
_files/from_ebay_pc/pics_and_such_from_work/phone_backup/try2_nonptp_or_whatever/Pictures/Screenshots/Screenshot_20160410-145331.png<br>
2 fa681c54a2bd7cfa590ddb8cf6ca1cea <br>
_files/from_ebay_pc/pics_and_such_from_work/phone_backup/try2_nonptp_or_whatever/Pictures/Screenshots/Screenshot_20160312-113340.png<br>
<br>
Originally the _files directory had MANY duplicates, now I've managed to <br>
get that down to the above list...<br>
<br>
Anyway, there you go. Happy scripting.<br>
<br>
---------------------------------------------------<br>
PLUG-discuss mailing list: <a href="mailto:PLUG-discuss@lists.phxlinux.org" target="_blank">PLUG-discuss@lists.phxlinux.org</a><br>
To subscribe, unsubscribe, or to change your mail settings:<br>
<a href="https://lists.phxlinux.org/mailman/listinfo/plug-discuss" rel="noreferrer" target="_blank">https://lists.phxlinux.org/mailman/listinfo/plug-discuss</a><br>
</blockquote></div><br clear="all"><div><br></div><span class="gmail_signature_prefix">-- </span><br><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><span style="font-size:12.8px">:-)~MIKE~(-:</span><br></div></div></div></div></div>