Python anyone?

Top Page
Attachments:
Message as email
+ (text/plain)
Delete this message
Reply to this message
Author: Joseph T. Tannenbaum
Date:  
Subject: Python anyone?
Hello,

I am going through Mastering Regular Expressions by Jeffrey Friedl (O'Reilly
book)
and I got to a snippet of using a regular expression in python. I am using
python
1.5.2 for win98, and this prog doen't work correctly. (it is supposed to
point out
double words in a text file ei "the the world." The main part I am in doubt
about
is setting up 'data' prior to printing. Is this format correct? It is on
pg 57 of
the book, and the reg3 line is correct by the errata.

Thanks
Joe

Here is the snippet:

import sys; import regex; import regsub

### Prepare the three regeses w'll use
reg1 = regex.compile(
            '\\b\([a-z]+\)\(\([\n\r\t\f\v ]\|<[^>]+>\)+\)\(\\1\\b\)',
            regex.casefold)
reg2 = regex.compile('^\([^033]*\n\)+')
reg3 = regex.compile('^\(.\)')


for filename in sys.argv[1:]:                    # for each file...
    try:
        file = open(filename)                      # try opening file
    except IOError, info:
        print '%s: %s' % (filename, info[1])    # report error if couldn't
        continue                                            # and also abort this iteration.


    data = file.read()                                # Slurp the whole file to 'data', apply regexes
and print
    data = regsub.gsub(reg1, '\033[7m\\1\033[m\2\033[7m\\4\033[m', data)
    data = regsub.gsub(reg2, '', data)
    data = regsub.gsub(reg3, filename + ':  \\1', data)
    print data,