perl regex
David A. Sinck
plug-discuss@lists.plug.phoenix.az.us
Wed, 26 Mar 2003 09:29:24 -0700
\_ SMTP quoth Mike Starke on 3/26/2003 11:11 as having spake thusly:
\_
\_ I am still struggling with an expression to parse
\_ out a file of the following format:
\_
\_ FIELD: data, FIELD: more, data, FIELD: data ....
\_ FIELD: data, FIELD: more, data, FIELD: data ....
\_ FIELD: data, FIELD: more, data, FIELD: data ....
\_ FIELD: data, FIELD: more, data, FIELD: data ....
\_
\_ I thought I had it, but notice above how some data has
\_ a comma contained within; that pretty much ruined
\_ my 'split' function :-)
\_
\_ Each line begins with the word 'TYPE', so I been able to weed
\_ out everything in the file that is not relevant
\_ by using something like:
\_ if ($_ =~ /^TYPE.*/) {
\_
\_ Beyond this, I just can not seem to get the proper
\_ expression to grab the fields and their cooresponding data.
while (<>)
{
my @parts = split/ FIELD: /;
print join("\n", @parts);
print "\n";
}
Granted, that assumes that it's all ' FIELD: ' tokenized. That's what
you get when you post an under-generalized example. :-)
Given:
TYPE FIELD1: data, FIELD2: more, data, FIELDN: data ....
TYPE FIELD1: data, FIELD2: more, data, FIELDN: data ....
TYPE FIELD1: data, FIELD2: more, data, FIELDN: data ....
garbage
TYPE FIELD1: data, FIELD2: more, data, FIELDN: data ....
as input
while (<>)
{
next unless (s/TYPE //);
my ($lame, @parts) = split/, (\w+:) /;
print "$lame\n";
while (@parts)
{
print shift @parts, ' ', shift @parts, "\n";
}
print "\n";
}
That takes it apart. Yes, $lame for a reason; I didn't feel like
fighting split more atm. :-)
David