perl regex

Top Page
Attachments:
Message as email
+ (text/plain)
Delete this message
Reply to this message
Author: David A. Sinck
Date:  
Subject: perl regex

\_ SMTP quoth Mike Starke on 3/26/2003 11:11 as having spake thusly:
\_
\_ I am still struggling with an expression to parse
\_ out a file of the following format:
\_
\_ FIELD: data, FIELD: more, data, FIELD: data ....
\_ FIELD: data, FIELD: more, data, FIELD: data ....
\_ FIELD: data, FIELD: more, data, FIELD: data ....
\_ FIELD: data, FIELD: more, data, FIELD: data ....
\_
\_ I thought I had it, but notice above how some data has
\_ a comma contained within; that pretty much ruined
\_ my 'split' function :-)
\_
\_ Each line begins with the word 'TYPE', so I been able to weed
\_ out everything in the file that is not relevant
\_ by using something like:
\_ if ($_ =~ /^TYPE.*/) {
\_
\_ Beyond this, I just can not seem to get the proper
\_ expression to grab the fields and their cooresponding data.

while (<>)
  {
    my @parts = split/ FIELD: /;
    print join("\n", @parts);
    print "\n";
  }



Granted, that assumes that it's all ' FIELD: ' tokenized. That's what
you get when you post an under-generalized example. :-)

Given:

TYPE FIELD1: data, FIELD2: more, data, FIELDN: data ....
TYPE FIELD1: data, FIELD2: more, data, FIELDN: data ....
TYPE FIELD1: data, FIELD2: more, data, FIELDN: data ....
garbage
TYPE FIELD1: data, FIELD2: more, data, FIELDN: data ....

as input

while (<>)
  {
    next unless (s/TYPE //);
    my ($lame, @parts) = split/, (\w+:) /;
    print "$lame\n";
    while (@parts)
      {
    print shift @parts, ' ', shift @parts, "\n";
      }
    print "\n";
  }



That takes it apart. Yes, $lame for a reason; I didn't feel like
fighting split more atm. :-)

David