[Bug 2240] Add last-match/short-circuit processing of include/exclude

samba-bugs at samba.org samba-bugs at samba.org
Sat Jan 15 15:35:25 GMT 2005


https://bugzilla.samba.org/show_bug.cgi?id=2240





------- Additional Comments From jerry at samba.org  2005-01-15 08:35 -------
On Thu, Jan 13, 2005, samba-bugs at samba.org wrote:


>> I'm not totally sure what you mean by "arbitrary nesting of matches", but I'll
>> assume that you're talking about being able to override the rules of a
>> .cvsignore file (which is not currently possible) and to string together
>> multiple generic --include-from/--exclude-from files on the command-line and
>> have some of the files specify a rule that will override the rules of a later
>> file (or earlier, depending on the order of the scan).


Yes. Just a typical example. You have the following tree:

  dir1/           (+)
  dir2/           (+)
    dir21/        (-)
    dir22/        (+)
      file221     (+)
      file222     (-)
      file223     (+)
    dir23/        (-)
  dir3/           (+)

Now consider that everything should be synchronized with dir2/* excluded
except for dir2/dir22 and that dir2/dir22/file222 also should be
not synchronized. With the current "first match" approach shuch a
synchronization cannot be done at all in a generic way. The only
possibility would be to perform multiple individual synchronizations,
but this becomes impossible if at some intermediate level an abitrary
number files or dirs exist. With the "last match" approach such a
synchronization is trivial:

 + *
 - dir2/*
 + dir2/dir22/
 - dir2/dir22/file222

That's especially also the reason why mostly all modern packet filters
or similar access control mechanisms use the "last match" approach
nowadays: it's a super-set of "first match" and with the "short
circuiting" add-on feature it is also as convenient as "first match".


>> [...]
>> As for the order of the includes/excludes, the same logic can be implemented in
>> either order, so we need some other reason besides adding a new short-circuit
>> syntax to change it. For instance, your patch can be thought of as implementing
>> first-match on the short-circuit rules, and falling back to last match on the
>> rest of the rules (and .cvsignore files must get added at the bottom of the list
>> of rules). In the current rsync order this would be implemented as a priority
>> last-match of the short-circuit rules, followed by first-match of the normal
>> rules (and .cvsignore files continue to be added at the top of the list of
>> rules). So, a --last-match option should only be added if we wish to give the
>> user the option of writing their rules in the opposite order, and I'm not sure
>> we need that.
>>
>> As for the implementation, I'd prefer to see one that doesn't always match every
>> name against every item in the list (if we can help it). We can do this by
>> adding a "previous" pointer to the exclude_struct so that it can be scanned in
>> either order. The code would then scan in one direction for just the
>> short-circuit rules (if any exist), and then fall back to scanning in the
>> opposite direction for normal rules. If the --last-match option was still
>> desired, I would make its only affect be to change the order of how the user's
>> items get put into the list (so that the same scanning code could be used for
>> both modes).


Hmmmm... I'm not sure whether I understand what you have in mind here.
In general the "short-circuit" rules are not directly equal to the rules
in the "first match" approach. It is correct that a ruleset consisting
of "short-circuit" rules _only_ effectively degrades the "last match"
approach to a "first match" approach. But in mostly all practical
situations "short-circuit" and regular rules are _intermixed_ and the
position of rules in the "last match" approach is _not_ arbitrary
(neither with nor without "short-circuit" rules). Hence one cannot
perform an arbitrary way to match the rules!

I prefer to think about the rules in a "last match" approach as
sub-terms of a left-associative mathematical expression (evaluated from
left to right) on sets where the used operators are "union" (+) and
"difference" (-) [not intersection!]. Here the expression cannot be just
re-grouped to right-association without changing the resulting set. The
same for the rules in the include/exclude lists.

So, while in the "first match" approach all rules are more or less equal
and can be most of the time matched against in an arbitrary order,
in the "last match" approach the rules can be matched against in the
specified order _only_. This ordering semantics is what makes the "last
match" approach a lot more flexible and powerful. The "short-circuit"
add-on feature is just the usual convenience solution added to "last
match" approaches to make the rule list easier to read (because one do
not have to assemble all "short-circuit" rules at the end of the rule
list and instead can keep them together with other rules applying to
similar elements).

But perhaps I still not understand exactly how you think the two
approaches can be mixed together. IMHO they can't and that was the
reason why I used an explicit option --last-match to switch between the
approaches.


>> I'm currently considering some changes to the include/exclude code: namely a
>> modified version of the current merge-exclude-file.diff in the patches dir, but
>> with the syntax of the include-rule lines changed. Thus, the addition of a new
>> "overriding" include/exclude idiom would go well with this.


I've still not looked at this patch, but whatever you do to improve
rsync, please just keep in mind that at the end what should be possible
is to express the "left-associative mathematical expression on file
sets where the used operators are union and difference". How the
syntax looks, whether an explicit option has to enable it, etc is not
important. But it's important to be able to perform such more powerful
and flexible synchronizations.

                                       Ralf S. Engelschall
                                       rse at engelschall.com
                                       www.engelschall.com



-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.


More information about the rsync mailing list