Re: Python help (finding duplicates)

> OK, > I've attached a complete program that works, if you want to just get it > done, but I've also described what went wrong in your first attempt below. >

> # the i value was just for debugging, so I dropped it
> primaryfile = open('/tmp/extract','r')
> # read the primary file into a list for speed and so you aren't reading
> more than once primary_lines = primaryfile.readlines()
> # you didn't specify a mode for this, so it defaulted to read-only.  Be
> explicit for clarity secondaryfile = open('/tmp/unload', 'r')
> # Open a separate file for output, otherwise you would have been writing
> and reading the same file over and over again, which usually causes errors
> outputfile = open('/tmp/result-file', 'w')
> # read the second file into a list, then you can scan through it over and
> over without hammering disk and re-reading a file you might have modified.
> secondary_lines = secondaryfile.readlines()
> # print is a statement, not a function.
> print 'opened files'
> # loop through the list, not the file
> for line in primary_lines:
>    pcompare = line
>    # print is a statement, use the formatting operator to print variable
> values print 'primary line = %s' % (pcompare)
>    # loop through the list, not the file
>    for row in secondary_lines:
>      scompare = row
>      if pcompare == scompare:
>        # print as a statement, not a function
>        print 'secondary line = %s' % (scompare)
>        # you were writing random # characters in a file (most likely after
> the line read), this writes a comment to a new file, which is usually
> clearer. # invert the test, and add the line to a set here then write out
> the set at the end to get an output of lines without duplication.
> outputfile.write('#%s' % (scompare))
> print 'Done'

This message is part of the following thread:
	the complete thread tree sorted by date
	Dazed_75 at
	Joseph Sinclair at