New wildmatch code in CVS
Donovan Baarda
abo at minkirri.apana.org.au
Wed Jul 9 22:02:16 EST 2003
Quoting Wayne Davison <wayned at samba.org>:
> If you've been watching CVS, you may have noticed that I checked in
> some
> new files named wildmatch.c and wildmatch.h. This code implements the
> shell-style wildcard matching with rsync's extension that "**" matches
[...]
> build farm. One thing I discovered is that the various fnmatch()
> calls
> all seem to handle the character-class boundary cases in conflicting
> ways (things like "[]-]"). So, one benefit for rsync of switching
> from
> fnmatch to wildmatch will be to make the callers of this function
> behave
> consistently on all platforms (which primarily affects the exclude
> code).
[...]
This might explain why Python implements its own fnmatch.py using regex's.
> Anyone have any concerns or comments on switching over to this new
> code?
Only one concern, and few questions, and a maybe suggestion;
The concern:
Why the name "wildmatch"? It seems a bit too arbitary to me. I have used the
name "efnmatch" (extended fnmatch) for it in my Python implementations. The
name "wildmatch" is too generic, whereas "efnmatch" clearly indicates it is an
exension to the standard fnmatch. A silly concern I know, but it will make my
life easier when I start making Python extension modules out of your code to
use in mine :-)
Some Questions:
How did you implement it (I know, I should just look in CVS, but while I'm
typing...)? Does it use regexes or a modified implementation of fnmatch? How
does it compare performance-wise with a regex based implementation?
The reason I'm curious is Python, for whatever reason, implements fnmatch in
Python using regex's rather than using a C python extension (possibly to avoid
the fnmatch variations you identified). I'm wondering if it would be worth re-
implemnting fnmatch (and efnmatch) as C extension modules.
The maybe suggestion:
I found by implementing efnmatch using regex's, it was painless to add the
ability to use regex's in include/exclude lists. This meant include/exclude
lists could be built using either efnmatch wildcards or regex's, as they would
all be converted, compiled, and matched as regex's anyway.
I don't know how regex matching compares to fnmatch matching performance-wise.
I'm also aware that people have expressed concerns about linking in/against
largish regex lib's. However, if the option of using regex's for
include/excludes is ever going to happen, then it might be an idea to use them
for this.
Personally, I feel the efnmatch functionality is flexible enough to never
require regex's, but I've seen a few enquiries in the past..
ABO
More information about the rsync
mailing list