Re: Python help (finding duplicates)

# the i value was just for debugging, so I dropped it
primaryfile = open('/tmp/extract','r')
# read the primary file into a list for speed and so you aren't reading more than once
primary_lines = primaryfile.readlines()
# you didn't specify a mode for this, so it defaulted to read-only.  Be explicit for clarity
secondaryfile = open('/tmp/unload', 'r')
# Open a separate file for output, otherwise you would have been writing and reading the same file over and over again, which usually causes errors
outputfile = open('/tmp/result-file', 'w')
# read the second file into a list, then you can scan through it over and over without hammering disk and re-reading a file you might have modified.
secondary_lines = secondaryfile.readlines()
# print is a statement, not a function.
print 'opened files'
# loop through the list, not the file
for line in primary_lines:
   pcompare = line
   # print is a statement, use the formatting operator to print variable values
   print 'primary line = %s' % (pcompare)
   # loop through the list, not the file
   for row in secondary_lines:
     scompare = row
     if pcompare == scompare:
       # print as a statement, not a function
       print 'secondary line = %s' % (scompare)
       # you were writing random # characters in a file (most likely after the line read), this writes a comment to a new file, which is usually clearer.
       # invert the test, and add the line to a set here then write out the set at the end to get an output of lines without duplication.
       outputfile.write('#%s' % (scompare))
print 'Done'

This message is part of the following thread:
	the complete thread tree sorted by date
	Kevin Faulkner at
	Dazed_75 at