wildcards (was Re: a problem I'm having with rsync-4.5.4)

Donovan Baarda abo at minkirri.apana.org.au
Wed May 8 18:07:02 EST 2002


On Wed, May 08, 2002 at 10:01:12AM -0700, Wayne Davison wrote:
> On Wed, 8 May 2002, Dave Dykstra wrote:
> > And in fact I think the non-wildcard-matching code actually succeeds,
> > doesn't it?
> 
> Yes, sorry for the unclear sentence.
> 
> > I doubt it's worth trying to fix the fnmatch() code, because fnmatch
> > is a standard function and it would be a lot of work to maintain our
> > own modified version.

FWIW, I wrote my own rsync-like "efnmatch()" (extended fnmatch) method in
Python that simply builds and compiles a regex. Maybe not as fast as
something else, but certainly a simple way to do it if you are worried about
mantainence (I'm refering to the algorithm, not the Python language). I've
attached it for reference.

-- 
----------------------------------------------------------------------
ABO: finger abo at minkirri.apana.org.au for more info, including pgp key
----------------------------------------------------------------------
-------------- next part --------------
"""Filename matching with extended shell patterns.

efnmatch(FILENAME, PATTERN) matches according to the local convention.
efnmatchcase(FILENAME, PATTERN) always takes case in account.

The functions operate by translating the pattern into a regular
expression.  They cache the compiled regular expressions for speed.

The function translate(PATTERN) returns a regular expression
corresponding to PATTERN.  (It does not compile it.)
"""

import re

_cache = {}

def efnmatch(name, pat):
    """Test whether FILENAME matches PATTERN.

    Patterns are an extended Unix shell style:

    **      matches everything including os.sep
    *       matches everything except os.sep
    ?       matches any single character except os.sep
    ??      matches any single character including os.sep
    [seq]   matches any character in seq
    [!seq]  matches any char not in seq

    An initial period in FILENAME is not special.
    Both FILENAME and PATTERN are first case-normalized
    if the operating system requires it.
    If you don't want this, use fnmatchcase(FILENAME, PATTERN).
    """
    import os
    name = os.path.normcase(name)
    pat = os.path.normcase(pat)
    return efnmatchcase(name, pat)

def efnmatchcase(name, pat):
    """Test whether FILENAME matches PATTERN, including case.

    This is a version of efnmatch() which doesn't case-normalize
    its arguments.
    """
    if not _cache.has_key(pat):
        res = translate(pat)
        _cache[pat] = re.compile(res)
    return _cache[pat].match(name) is not None

def translate(pat,sep=None):
    """Translate a shell PATTERN to a regular expression.

    There is no way to quote meta-characters.
    """
    import os,string
    if not sep: sep=os.sep
    sep=re.escape(sep)

    i, n = 0, len(pat)
    res = ''
    while i < n:
        c,s = pat[i],pat[i:i+2]
        i = i+1
        if s == '**':
            res = res + '.*'
            i = i + 1
        elif c == '*':
            res = res + '[^' + sep + ']*'
        elif s == '??':
            res = res + '.'
            i=i+1
        elif c == '?':
            res = res + '[^' + sep + ']'
        elif c == '[':
            j = i
            if j < n and pat[j] == '!':
                j = j+1
            if j < n and pat[j] == ']':
                j = j+1
            while j < n and pat[j] != ']':
                j = j+1
            if j >= n:
                res = res + '\\['
            else:
                stuff = string.replace(pat[i:j],'\\','\\\\')
                i = j+1
                if stuff[0] == '!':
                    stuff = '^' + stuff[1:]
                elif stuff[0] == '^':
                    stuff = '\\' + stuff
                res = res + '[' + stuff + ']'
        else:
            res = res + re.escape(c)
    return res + "$"


More information about the rsync mailing list