perl regex

Wed, 26 Mar 2003 09:29:24 -0700

\_ SMTP quoth Mike Starke on 3/26/2003 11:11 as having spake thusly:
\_
\_ I am still struggling with an expression to parse
\_ out a file of the following format:
\_ 
\_ FIELD: data, FIELD: more, data, FIELD: data ....
\_ FIELD: data, FIELD: more, data, FIELD: data ....
\_ FIELD: data, FIELD: more, data, FIELD: data ....
\_ FIELD: data, FIELD: more, data, FIELD: data ....
\_ 
\_ I thought I had it, but notice above how some data has
\_ a comma contained within; that pretty much ruined
\_ my 'split' function :-)
\_ 
\_ Each line begins with the word 'TYPE', so I been able to weed
\_ out everything in the file that is not relevant
\_ by using something like:
\_ if ($_  =~ /^TYPE.*/) {
\_ 
\_ Beyond this, I just can not seem to get the proper
\_ expression to grab the fields and their cooresponding data.

while (<>)
  {
    my @parts = split/ FIELD: /;
    print join("\n", @parts);
    print "\n";
  }

Granted, that assumes that it's all ' FIELD: ' tokenized.  That's what
you get when you post an under-generalized example. :-)

Given:

TYPE FIELD1: data, FIELD2: more, data, FIELDN: data ....
TYPE FIELD1: data, FIELD2: more, data, FIELDN: data ....
TYPE FIELD1: data, FIELD2: more, data, FIELDN: data ....
garbage
TYPE FIELD1: data, FIELD2: more, data, FIELDN: data ....

as input

while (<>)
  {
    next unless (s/TYPE //);
    my ($lame, @parts) = split/, (\w+:) /;
    print "$lame\n";
    while (@parts)
      {
	print shift @parts, ' ', shift @parts, "\n";
      }
    print "\n";
  }

That takes it apart.  Yes, $lame for a reason; I didn't feel like
fighting split more atm.  :-)

David