OK, I've attached a complete program that works, if you want to just get it done, but I've also described what went wrong in your first attempt below. # the i value was just for debugging, so I dropped it primaryfile = open('/tmp/extract','r') # read the primary file into a list for speed and so you aren't reading more than once primary_lines = primaryfile.readlines() # you didn't specify a mode for this, so it defaulted to read-only. Be explicit for clarity secondaryfile = open('/tmp/unload', 'r') # Open a separate file for output, otherwise you would have been writing and reading the same file over and over again, which usually causes errors outputfile = open('/tmp/result-file', 'w') # read the second file into a list, then you can scan through it over and over without hammering disk and re-reading a file you might have modified. secondary_lines = secondaryfile.readlines() # print is a statement, not a function. print 'opened files' # loop through the list, not the file for line in primary_lines: pcompare = line # print is a statement, use the formatting operator to print variable values print 'primary line = %s' % (pcompare) # loop through the list, not the file for row in secondary_lines: scompare = row if pcompare == scompare: # print as a statement, not a function print 'secondary line = %s' % (scompare) # you were writing random # characters in a file (most likely after the line read), this writes a comment to a new file, which is usually clearer. # invert the test, and add the line to a set here then write out the set at the end to get an output of lines without duplication. outputfile.write('#%s' % (scompare)) print 'Done' Kevin Faulkner wrote: > Sorry about the time issue. > On Friday 27 August 2010 23:50:00 you wrote: >> I hope these are small files, the algorithm you wrote is not going to run >> well as file size gets large (over 10,000 entries) Have you checked the >> space/tab situation? Python uses indentation changes to indicate the end >> of a block, so inconsistent use of tabs and spaces freaks it out. Here are >> a couple questions: > This is not a school project, so you won't be doing my homework or anything :) > The space/tab issue is okay, but the script does not even get to the print(i), > I even tried for line in secondaryfile: and the for loop still wouldn't be > executed. >> Are these always numbers? > Yes, they are IP's from an Apache error log. >> Do the files have to remain in their original order, or can you reorder >> them during processing? How often does this have to run? > they are not in order because one list is 852 entries and another list is 3300 > entries. This script only needs to run once. >> Do you have to "comment" the duplicate, or can you remove it? > The plan is to remove it, but I wanted to see if my removal method would work, > so I was trying to put a comment next to it. >> Are there any other requirements not obvious from the description below? > No real requirements, if anyone would like the original files I can give them > to you, a lot of them are bots. > Thank you :) > -Kevin >> Kevin Faulkner wrote: >>> I was trying to pull duplicates out of 2 different files. Needless to say >>> there are duplicates I would place a # next to the duplicate. Example >>> files: file 1: file 2: >>> 433.3 947.3 >>> 543.1 749.0 >>> 741.1 859.2 >>> 238.5 433.3 >>> 839.2 229.1 >>> 583.6 990.1 >>> 863.4 741.1 >>> 859.2 101.8 >>> >>> import string >>> i=1 >>> primaryfile = open('/tmp/extract','r') >>> secondaryfile = open('/tmp/unload') >>> >>> for line in primaryfile: >>> pcompare = line >>> print(pcompare) >>> >>> for row in secondaryfile: >>> i = i + 1 >>> print(i) >>> scompare = row >>> >>> if pcompare == scompare: >>> print(scompare) >>> secondaryfile.write('#') >>> >>> With this code it should go through the files and find a duplicate and >>> place a '#' next to it. But for some reasonson it doesn't even get to >>> the second for statement. I don't know what else to do. Please offer >>> some assistance. :) --------------------------------------------------- >>> PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us >>> To subscribe, unsubscribe, or to change your mail settings: >>> http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss > --------------------------------------------------- > PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us > To subscribe, unsubscribe, or to change your mail settings: > http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss >