Help with Regular Expression
AZ Pete
plug-discuss@lists.plug.phoenix.az.us
Tue, 03 Dec 2002 22:41:38 -0700
Unfortunately, the regex David provided didn't quite solve the problem.
I'll explain via an example.
If the phrase in question is: hello\nthere
The regex s/[^\n][\n][^\n]/ /g; would result in: hell here
Not only does the newline get removed but one character on either side of
it as well.
This is what I found works:
s/([^\n])\n([^\n])/\1 \2/ # non-newline newline non-newline goes to
non-newline space non-newline
Thanks for the help, it got me on the right track!!
Peter
At 12/3/02 07:48 PM , you wrote:
>One quick note... you indicated that any of \n, \r, or \r\n should be
>considered a newline, and that you wished to preserve any double
>newlines. The two steps suggested will lose any \r\r doubles. Perhaps
>three steps...
>
># convert \r\n to \n
># convert remaining \r to \n (that is, where \r is a newline on its
>own)
># remove isolated newlines (David's second step)
>
>This assumes that in a single unit of text to be matched against, \r
>and \n cannot both be standalone newlines (a reasonable assumption, I
>think).
>
>-Alex
>
>PS - I couldn't resist the exercise. I think at least in Perl (I don't
>know about the PHP implementation) [ignore line wrapping...]
>
>s/((?<!(\r\n))(\r\n)(?!(\r\n)))|((?<!(\r(?!\n)))(\r(?!\n))(?!(\r(?!\n))))|(
>(?<!((?<!\r)\n))((?<!\r)\n)(?!((?<!\r)\n)))
>//mg
>
> , but don't do that. Nested negative zero-width assertions are
>amusing, but ugly and slow. :)
>
>--- plug-discuss-request@lists.plug.phoenix.az.us wrote:
>
> > Thanks David!!
> >
> > I was trying to do it all in one shot and was getting some "amusing"
> > results.
> > Your method is much more straightforward and easier to understand.
> > Peter
> >
> > On 3 Dec 2002 at 11:21, David A. Sinck wrote:
> >
> > >
> > >
> > > \_ SMTP quoth az_pete@cactusfamily.com on 12/3/2002 11:04 as having
> > spake thusly:
> > > \_
> > > \_ Hi All,
> > > \_
> > > \_ I seem to be having a lot of trouble with what seems should be a
> > > \_ simple regex.
> > > \_
> > > \_ I have a database full of research paper abstracts and I would
>like
> > > \_ to strip all newlines from them. This would include \n, \r, and
> > > \_ \r\n characters. However, if there are two consecutive newlines
> > > \_ (i.e. new paragraph) I would like to keep those in tact.
> > > \_
> > > \_ I have written the script in PHP to pull each field from the
> > > \_ database, perform said regex and then update the field with the
>new
> > > \_ data. All I need is a regex that works. I'm using the Perl
> > > \_ compatible regex within PHP.
> > > \_
> > > \_ Any help would be appreciated.
> > >
> > > I'd do two passes for ease of thought:
> > >
> > > s/\r//g; # lose all \r's, regardless
> > >
> > > s/[^\n][\n][^\n]/ /g; # non-newline newline non-newline goes to
> > space
> > >
> > > YMMV.
> > >
> > > Trying to do both in one could prove more amusing and is left as an
> > > exercise for the reader.
> > >
> > > Backups are your friend.
> > >
> > > David
>
>
>__________________________________________________
>Do you Yahoo!?
>Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
>http://mailplus.yahoo.com
>---------------------------------------------------
>PLUG-discuss mailing list - PLUG-discuss@lists.plug.phoenix.az.us
>To subscribe, unsubscribe, or to change you mail settings:
>http://lists.PLUG.phoenix.az.us/mailman/listinfo/plug-discuss