Help with Regular Expression
Alexander LeDonne
plug-discuss@lists.plug.phoenix.az.us
Tue, 3 Dec 2002 18:48:56 -0800 (PST)
One quick note... you indicated that any of \n, \r, or \r\n should be
considered a newline, and that you wished to preserve any double
newlines. The two steps suggested will lose any \r\r doubles. Perhaps
three steps...
# convert \r\n to \n
# convert remaining \r to \n (that is, where \r is a newline on its
own)
# remove isolated newlines (David's second step)
This assumes that in a single unit of text to be matched against, \r
and \n cannot both be standalone newlines (a reasonable assumption, I
think).
-Alex
PS - I couldn't resist the exercise. I think at least in Perl (I don't
know about the PHP implementation) [ignore line wrapping...]
s/((?<!(\r\n))(\r\n)(?!(\r\n)))|((?<!(\r(?!\n)))(\r(?!\n))(?!(\r(?!\n))))|((?<!((?<!\r)\n))((?<!\r)\n)(?!((?<!\r)\n)))
//mg
, but don't do that. Nested negative zero-width assertions are
amusing, but ugly and slow. :)
--- plug-discuss-request@lists.plug.phoenix.az.us wrote:
> Thanks David!!
>
> I was trying to do it all in one shot and was getting some "amusing"
> results.
> Your method is much more straightforward and easier to understand.
> Peter
>
> On 3 Dec 2002 at 11:21, David A. Sinck wrote:
>
> >
> >
> > \_ SMTP quoth az_pete@cactusfamily.com on 12/3/2002 11:04 as having
> spake thusly:
> > \_
> > \_ Hi All,
> > \_
> > \_ I seem to be having a lot of trouble with what seems should be a
> > \_ simple regex.
> > \_
> > \_ I have a database full of research paper abstracts and I would
like
> > \_ to strip all newlines from them. This would include \n, \r, and
> > \_ \r\n characters. However, if there are two consecutive newlines
> > \_ (i.e. new paragraph) I would like to keep those in tact.
> > \_
> > \_ I have written the script in PHP to pull each field from the
> > \_ database, perform said regex and then update the field with the
new
> > \_ data. All I need is a regex that works. I'm using the Perl
> > \_ compatible regex within PHP.
> > \_
> > \_ Any help would be appreciated.
> >
> > I'd do two passes for ease of thought:
> >
> > s/\r//g; # lose all \r's, regardless
> >
> > s/[^\n][\n][^\n]/ /g; # non-newline newline non-newline goes to
> space
> >
> > YMMV.
> >
> > Trying to do both in one could prove more amusing and is left as an
> > exercise for the reader.
> >
> > Backups are your friend.
> >
> > David
__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com