wildcards (was Re: a problem I'm having with rsync-4.5.4)
Donovan Baarda
abo at minkirri.apana.org.au
Wed May 8 18:07:02 EST 2002
On Wed, May 08, 2002 at 10:01:12AM -0700, Wayne Davison wrote:
> On Wed, 8 May 2002, Dave Dykstra wrote:
> > And in fact I think the non-wildcard-matching code actually succeeds,
> > doesn't it?
>
> Yes, sorry for the unclear sentence.
>
> > I doubt it's worth trying to fix the fnmatch() code, because fnmatch
> > is a standard function and it would be a lot of work to maintain our
> > own modified version.
FWIW, I wrote my own rsync-like "efnmatch()" (extended fnmatch) method in
Python that simply builds and compiles a regex. Maybe not as fast as
something else, but certainly a simple way to do it if you are worried about
mantainence (I'm refering to the algorithm, not the Python language). I've
attached it for reference.
--
----------------------------------------------------------------------
ABO: finger abo at minkirri.apana.org.au for more info, including pgp key
----------------------------------------------------------------------
-------------- next part --------------
"""Filename matching with extended shell patterns.
efnmatch(FILENAME, PATTERN) matches according to the local convention.
efnmatchcase(FILENAME, PATTERN) always takes case in account.
The functions operate by translating the pattern into a regular
expression. They cache the compiled regular expressions for speed.
The function translate(PATTERN) returns a regular expression
corresponding to PATTERN. (It does not compile it.)
"""
import re
_cache = {}
def efnmatch(name, pat):
"""Test whether FILENAME matches PATTERN.
Patterns are an extended Unix shell style:
** matches everything including os.sep
* matches everything except os.sep
? matches any single character except os.sep
?? matches any single character including os.sep
[seq] matches any character in seq
[!seq] matches any char not in seq
An initial period in FILENAME is not special.
Both FILENAME and PATTERN are first case-normalized
if the operating system requires it.
If you don't want this, use fnmatchcase(FILENAME, PATTERN).
"""
import os
name = os.path.normcase(name)
pat = os.path.normcase(pat)
return efnmatchcase(name, pat)
def efnmatchcase(name, pat):
"""Test whether FILENAME matches PATTERN, including case.
This is a version of efnmatch() which doesn't case-normalize
its arguments.
"""
if not _cache.has_key(pat):
res = translate(pat)
_cache[pat] = re.compile(res)
return _cache[pat].match(name) is not None
def translate(pat,sep=None):
"""Translate a shell PATTERN to a regular expression.
There is no way to quote meta-characters.
"""
import os,string
if not sep: sep=os.sep
sep=re.escape(sep)
i, n = 0, len(pat)
res = ''
while i < n:
c,s = pat[i],pat[i:i+2]
i = i+1
if s == '**':
res = res + '.*'
i = i + 1
elif c == '*':
res = res + '[^' + sep + ']*'
elif s == '??':
res = res + '.'
i=i+1
elif c == '?':
res = res + '[^' + sep + ']'
elif c == '[':
j = i
if j < n and pat[j] == '!':
j = j+1
if j < n and pat[j] == ']':
j = j+1
while j < n and pat[j] != ']':
j = j+1
if j >= n:
res = res + '\\['
else:
stuff = string.replace(pat[i:j],'\\','\\\\')
i = j+1
if stuff[0] == '!':
stuff = '^' + stuff[1:]
elif stuff[0] == '^':
stuff = '\\' + stuff
res = res + '[' + stuff + ']'
else:
res = res + re.escape(c)
return res + "$"
More information about the rsync
mailing list