Python help (finding duplicates)

Joseph Sinclair plug-discussion at stcaz.net
Fri Aug 27 23:50:00 MST 2010


I hope these are small files, the algorithm you wrote is not going to run well as file size gets large (over 10,000 entries)
Have you checked the space/tab situation?  Python uses indentation changes to indicate the end of a block, so inconsistent use of tabs and spaces freaks it out.
Here are a couple questions:
Are these always numbers?
Do the files have to remain in their original order, or can you reorder them during processing?
How often does this have to run?
Do you have to "comment" the duplicate, or can you remove it?
Are there any other requirements not obvious from the description below?

Kevin Faulkner wrote:
> I was trying to pull duplicates out of 2 different files. Needless to say there 
> are duplicates I would place a # next to the duplicate. Example files:
> file 1:	file 2:
> 433.3	947.3
> 543.1	749.0
> 741.1	859.2
> 238.5	433.3
> 839.2	229.1
> 583.6	990.1
> 863.4	741.1
> 859.2	101.8
> 
> import string
> i=1
> primaryfile = open('/tmp/extract','r')
> secondaryfile = open('/tmp/unload')
> for line in primaryfile:
>    pcompare = line
>    print(pcompare)
>    for row in secondaryfile:
>      i = i + 1
>      print(i)
>      scompare = row
>      if pcompare == scompare:
>        print(scompare)
>        secondaryfile.write('#')
> With this code it should go through the files and find a duplicate and place a 
> '#' next to it. But for some reasonson it doesn't even get to the second for 
> statement. I don't know what else to do. Please offer some assistance. :)
> ---------------------------------------------------
> PLUG-discuss mailing list - PLUG-discuss at lists.plug.phoenix.az.us
> To subscribe, unsubscribe, or to change your mail settings:
> http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: OpenPGP digital signature
URL: <http://lists.PLUG.phoenix.az.us/pipermail/plug-discuss/attachments/20100827/53f52692/attachment.pgp>


More information about the PLUG-discuss mailing list