New wildmatch code in CVS

Donovan Baarda abo at minkirri.apana.org.au
Wed Jul 9 22:02:16 EST 2003


Quoting Wayne Davison <wayned at samba.org>:

> If you've been watching CVS, you may have noticed that I checked in
> some
> new files named wildmatch.c and wildmatch.h.  This code implements the
> shell-style wildcard matching with rsync's extension that "**" matches
[...]
> build farm.  One thing I discovered is that the various fnmatch()
> calls
> all seem to handle the character-class boundary cases in conflicting
> ways (things like "[]-]").  So, one benefit for rsync of switching
> from
> fnmatch to wildmatch will be to make the callers of this function
> behave
> consistently on all platforms (which primarily affects the exclude
> code).
[...]

This might explain why Python implements its own fnmatch.py using regex's.

> Anyone have any concerns or comments on switching over to this new
> code?

Only one concern, and few questions, and a maybe suggestion;

The concern: 

Why the name "wildmatch"? It seems a bit too arbitary to me. I have used the 
name "efnmatch" (extended fnmatch) for it in my Python implementations. The 
name "wildmatch" is too generic, whereas "efnmatch" clearly indicates it is an 
exension to the standard fnmatch. A silly concern I know, but it will make my 
life easier when I start making Python extension modules out of your code to 
use in mine :-)

Some Questions:

How did you implement it (I know, I should just look in CVS, but while I'm 
typing...)? Does it use regexes or a modified implementation of fnmatch? How 
does it compare performance-wise with a regex based implementation?

The reason I'm curious is Python, for whatever reason, implements fnmatch in 
Python using regex's rather than using a C python extension (possibly to avoid 
the fnmatch variations you identified). I'm wondering if it would be worth re-
implemnting fnmatch (and efnmatch) as C extension modules.

The maybe suggestion:

I found by implementing efnmatch using regex's, it was painless to add the 
ability to use regex's in include/exclude lists. This meant include/exclude 
lists could be built using either efnmatch wildcards or regex's, as they would 
all be converted, compiled, and matched as regex's anyway.

I don't know how regex matching compares to fnmatch matching performance-wise. 
I'm also aware that people have expressed concerns about linking in/against 
largish regex lib's. However, if the option of using regex's for 
include/excludes is ever going to happen, then it might be an idea to use them 
for this.

Personally, I feel the efnmatch functionality is flexible enough to never 
require regex's, but I've seen a few enquiries in the past..

ABO



More information about the rsync mailing list