Re: Python help (finding duplicates)

> I really appreciate what you have done. I more so like the description of what 
> I did wrong. Using readlines() is a better approach like you said, less disk 
> thrashing. I was using /usr/bin/python3, so print() is now a function. My next 
> step is to take the host list and identify where the IP is using pygeoip.
> Thank you again. :)
> -Kevin
>> # the i value was just for debugging, so I dropped it
>> primaryfile = open('/tmp/extract','r')
>> # read the primary file into a list for speed and so you aren't reading
>> more than once primary_lines = primaryfile.readlines()
>> # you didn't specify a mode for this, so it defaulted to read-only.  Be
>> explicit for clarity secondaryfile = open('/tmp/unload', 'r')
>> # Open a separate file for output, otherwise you would have been writing
>> and reading the same file over and over again, which usually causes errors
>> outputfile = open('/tmp/result-file', 'w')
>> # read the second file into a list, then you can scan through it over and
>> over without hammering disk and re-reading a file you might have modified.
>> secondary_lines = secondaryfile.readlines()
>> # print is a statement, not a function.
>> print 'opened files'
>> # loop through the list, not the file
>> for line in primary_lines:
>>    pcompare = line
>>    # print is a statement, use the formatting operator to print variable
>> values print 'primary line = %s' % (pcompare)
>>    # loop through the list, not the file
>>    for row in secondary_lines:
>>      scompare = row
>>      if pcompare == scompare:
>>        # print as a statement, not a function
>>        print 'secondary line = %s' % (scompare)
>>        # you were writing random # characters in a file (most likely after
>> the line read), this writes a comment to a new file, which is usually
>> clearer. # invert the test, and add the line to a set here then write out
>> the set at the end to get an output of lines without duplication.
>> outputfile.write('#%s' % (scompare))
>> print 'Done'

This message is part of the following thread:
	the complete thread tree sorted by date
	Kevin Faulkner at