Non greedy pattern match in sed

Paul Bryan pa_bryan at yahoo.co.uk
Tue Aug 13 22:32:45 EST 2002


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ahh... I see. My bad. Didn't realise the "?" had that much of an effect. 

I stand corrected.

Cheers,
Paul.
On Tue, 13 Aug 2002 22:03, Kim Holburn wrote:
> At 9:35 PM +1000 2002/08/13, Paul Bryan wrote:
> >-----BEGIN PGP SIGNED MESSAGE-----
> >Hash: SHA1
> >
> >On Tue, 13 Aug 2002 21:15, Kim Holburn wrote:
> >> If you use perl you can do this. From the perl manual (man perlre)
> >>
> >>        By default, a quantified subpattern is "greedy", that is,
> >>        it will match as many times as possible (given a particu-
> >>        lar starting location) while still allowing the rest of
> >>        the pattern to match.  If you want it to match the minimum
> >>        number of times possible, follow the quantifier with a
> >>        "?".  Note that the meanings don't change, just the
> >>        "greediness":
> >>
> >>            *?     Match 0 or more times
> >>            +?     Match 1 or more times
> >>            ??     Match 0 or 1 time
> >>            {n}?   Match exactly n times
> >>            {n,}?  Match at least n times
> >>            {n,m}? Match at least n but not more than m times
> >>
> >>  .... | perl -p -e 's/\/\*.*?\*\//g'
> >
> >I don't think this is quite right because the regex isn't matching
> > mulitple instances of ".*"
>
> Well,
> 1) it works
> 2) it is matching multiple instances of "." -  "*" means multiple instances
> of (whatever came first in this case ".") .  .*? means the smallest
> possible multiple instance of "." .
>
> Here is what it says in my perl book:
>
> If you say /.*foo/, for example, it will try to match the maximal number of
> "any" characters (represented by the dot) clear out to the end of the line
> before it ever tries looking for "foo", and then when the "foo" doesn't
> match there (and it can't, because there's not enough room for it at the
> end of the string), the Engine will back off one character at a time until
> it finds a "foo".  If there is more than one "foo" in the line, it'll stop
> on the last one, and throw away all the shorter choices it could have made.
>
> By placing a question mark after any of the greedy quantifiers, they can be
> made to choose the smallest quantity for the first try.  So if you say
> /.*?foo/, the .*? first tries to match 0 characters, then 1 character, then
> 2, and so on until it can match the "foo".  Instead of backtracking
> backward, it backtracks forward, so to speak, and ends up finding the first
> "foo" on the line instead of the last.
>
> >What it's doing is looking for the last occurance of "*/". So using a
> >quantifier like "?" isn't going to make any difference because it's the
> > last "*/" that's causing the problem.
> >
> >In other words in the command:
> >
> >echo "/* This is a */ test /* line */" | sed 's/\/\*.*\*\//g'
> >
> >it finds "*/" before "test", but doesn't stop there! This is where the
> > greedy part comes in to it. It keeps on looking to see if there is
> > another occurance of "*/", which it finds at the end of the line. Of
> > course ".*" still matches everything in between including the first
> > occurance of "*/".
> >
> >Just thought I should try and clear that one up.
> >
> >Cheers,
> >Paul.
> >
> >p.s.
> >
> >PHP has the "U" pattern modifier which specifies ungreedy matches. This
> > stops looking once it finds the first "*/". The only problem there is
> > that the ".*" part of the pattern will match the rest of the line
> > including the second comment.
> >- --
> >Paul Bryan
> >E-Mail: pa_bryan at yahoo.co.uk
> >
> >PGP Key
> >http://www.keyserver.net:11371/pks/lookup?op=get&search=0xB1D405DA
> >
> >It's always darkest just before it gets pitch black.
> >-----BEGIN PGP SIGNATURE-----
> >Version: GnuPG v1.0.6 (GNU/Linux)
> >Comment: For info see http://www.gnupg.org
> >
> >iD8DBQE9WO703qGyTLHUBdoRAuWRAKCJPz/Z5yl42Ay8X6xHnf/ZO0G7FACgqvZR
> >RQ56SE0Tt8zpWxypoaNai0E=
> >=4loX
> >-----END PGP SIGNATURE-----

- -- 
Paul Bryan
E-Mail: pa_bryan at yahoo.co.uk

PGP Key
http://www.keyserver.net:11371/pks/lookup?op=get&search=0xB1D405DA

"Where shall I begin, please your Majesty?" he asked.  "Begin at the 
beginning,"
the King said, gravely, "and go on till you come to the end: then stop."
Alice's Adventures in Wonderland, Lewis Carroll
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE9WPxv3qGyTLHUBdoRAp/uAJwM+TyIRqdJaSgG7RO/8uHdGbhf8wCfYgCt
Qrz0QEtvTE3+qgzVejepTkY=
=fPUX
-----END PGP SIGNATURE-----



More information about the linux mailing list