wildcard matching - 2nd attempt

Andrew Tridgell tridge at linuxcare.com
Tue Jul 27 14:27:19 GMT 1999


NT wildcard matching is weirder than I thought. No wonder we've been
struggling with it for so long. The simple rules I posted yesterday
are not correct.

Today I tried to make a wildcard -> regex converter (using masktest
for testing) so we could use regcomp() and regexec() to test a exact
wildcard matcher. I'm now not at all sure it is possible. I have come
very close but rules like the following are very hard as regexps.

 - if a '>' or '?' character at position n in a pattern matches a
   character in the eigth position in the filename and the previous
   character in the pattern is a '>' or a '?' and the next character
   in the pattern is the end-of-string or a '.' or a '"' then the nth
   character in the pattern is treated as a '*' instead of a '>' or
   '?'

nice huh? as an example:

'abcdef>>' matches 'abcdefqwertyuiop'
but
'xabcdef>>' does not match 'xabcdefqwertyuiop'

similarly
'?b*f>>.x' matches 'abcdefqwertyuiop.x'
but 
'?b*e>>.x' does not match 'abcdefqwertyuiop.x'

apart from minor glitches like that I've got some code that translates
the like of '?.b"b..<."?".<' to '^.{1}[.]b([.]|$)b[.][.].*[.]([.]|$).{1}([.]|$)[.]([^.]*|[^.]*[.]|[.][^.]*|[.].*[.])$'
and gets the right answer when run through a POSIX regex library.

the regex based code is *much* more accuarate than our current code
and has far fewer special cases but still ain't perfect. Given the
stupidity of some of the rules I'm not at all sure that a perfect
match with NT semantics is actually a worthwhile goal!

Cheers, Tridge

PS: The rules given for wildcard matching in the CIFS spec are, of
course, totally different from what NT uses as a few simple tests
soon show. 


More information about the samba-technical mailing list