Regular expression help

Joel Pearson pearj at iprimus.com.au
Tue Dec 10 14:59:47 EST 2002


Hi,

Thanks for the info on backreferencing.
Kim sent me another regex which seems to work really nice
/<option value=([A-Z]{3})> \(([^<]+)\)/

and yes I do only want the data with 3 characters.

Joel

~~~~~~~~~~~~~~~~~~~~~~~
Joel Pearson
Email: pearj at writeme.com
ICQ:1580379
MSN: joelpearson at hotmail.com


-----Original Message-----
From: Paul Bryan [mailto:pa_bryan at yahoo.co.uk] 
Sent: Tuesday, 10 December 2002 2:48 PM
To: pearj at writeme.com; Joel Pearson; 'Linux user group'
Subject: Re: Regular expression help

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, 10 Dec 2002 12:24, Joel Pearson wrote:
> Thanks to everyone who suggested regular expressions to me, they were
> very helpful.
>
>
>
> In the end I decided to go with this regular expression, which should
be
> fine as long as there is no more than 1 set of brackets in the name:
>
> /value=(.{3})> \(([^\)\(]*|[^\(]*\([^\)]*\)[^\)]*)\)+/
> -Sent by Brian Graham

Do you only want the first three characters? Just wanted to clarify
because 
some of your data has a value parameter with 4 or 5 characters and this
will 
only match the three letter items. If you want to match all value
parameters 
irrespecitive of how many characters there are try:

/value=([^>]+)>

>
>
>
> Kim's suggestion of: /value=(\([A-Z]{3,4}\))> \(([^<]+)\) \1\(
> \&nbsp\)+<\/option/
>
> Seemed to be the best option, but for some reason or another it didn't
> want to work in php. (Something to do with backreferencing I think)

- From the looks, your first backrefernece (array position 1) should be
the 
value paramter. This will only get 3-4 character paramaters though (see
my 
above comment). The second backreference should be the part inside the 
brackets. 

Here's a few notes from the php manual to help clarify how php treats 
regexes. Also there are a lot of modifiers that you can use (eg.
greediness 
modifier). Check the php manual for details.

preg_match_all uses two different systems for returning the
backreferences. 
- From the php manual on preg_match_all:

<quote>
flags can be a combination of the following flags (note that it doesn't
make 
sense to use PREG_PATTERN_ORDER together with PREG_SET_ORDER):

PREG_PATTERN_ORDER

Orders results so that $matches[0] is an array of full pattern matches, 
$matches[1] is an array of strings matched by the first parenthesized 
subpattern, and so on. 

PREG_SET_ORDER

Orders results so that $matches[0] is an array of first set of matches, 
$matches[1] is an array of second set of matches, and so on. 
</quote>

You need to make sure that you're using the flag you want as it will
change 
the array structure. Usually I use PREG_SET_ORDER so that for each
match, the 
backreferences are indexed together eg. $matches[0][0] is the whole 
expression, $matches[0][1] is the first backreference. These are for the

first match. The second match is then $matches[1] and so on.

- From the "pattern syntax" section under "pcre", under the heading 
"subpatterns":

<quote>
The fact that plain parentheses fulfil two functions is  not always
helpful. 
There are often times when a grouping subpattern is required without a 
capturing requirement.  If  an opening parenthesis is followed by "?:",
the 
subpattern does not do any capturing, and is not counted when computing
the 
number of any subsequent capturing subpatterns. For example, if the
string" 
the  white  queen"  is  matched  against  the pattern

the ((?:red|white) (king|queen))

the captured substrings are "white queen" and  "queen",  and are
numbered  1 
and 2. The maximum number of captured substrings is 99, and the maximum 
number  of  all  subpatterns, both capturing and non-capturing, is 200. 
</quote>

This should help you figure out exactly what your backreferneces are 
(referred above as "capturing subpatterns").

Also check the top of the "pcre > pattern syntax" section to see how
these 
regexes differ from perl.

>
>
>
> Thanks again to everyone for your help
>
>
>
> Joel
>
>
>
> ~~~~~~~~~~~~~~~~~~~~~~~
>
> Joel Pearson
>
> Email: pearj at writeme.com
>
> ICQ:1580379 <http://web.icq.com/wwp/1,,,00.html?Uin=1580379>
>
> MSN: joelpearson at hotmail.com
>
<http://members.msn.com/default.msnw?mem=joelpearson@hotmail.com&pgmarke
> t=en-au>

- -- 
Paul Bryan
E-Mail: pa_bryan at yahoo.co.uk
 
PGP Key
http://www.keyserver.net:11371/pks/lookup?op=get&search=0xB1D405DA
What makes us so bitter against people who outwit us is that they think
themselves cleverer than we are.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE99WQC3qGyTLHUBdoRAnyWAJ94hmmflrt8CErrp020xI3CppMdbgCgmUUH
RovoUZhwzdRBMvNCQjC1r3w=
=F3MA
-----END PGP SIGNATURE-----





More information about the linux mailing list