feedback on rsync-HEAD-20050125-1221GMT

Fri Jan 28 20:42:25 GMT 2005

Chris Shoemaker wrote:

> If I understand Wayne's design, it would be possible to invent a
> (per-directory) "hook" rule, whose value is executed, and whose stdout
> is parsed as a [in|ex]clude file list.  E.g.:
> 
>  -R "cat .rsync-my-includes"
> 
> or
> 
>  -R "find . -ctime 1 -a ! -fstype nfs -a ! -empty -o iname 'foo*'"

This is certainly a very powerful mechanism, but it definitely should 
not be the only way we implement file filtering.  Two problems:

1. Sprinkling rule files like these across directories would mean 
executing external programs all the time for each file to be considered. 
  This would presumably slow down rsync's execution by an order of 
magnitude or so and suck the life out of a system doing a big backup job.

2. Who does actually need such powerful but yet hard-to-handle 
mechanism?  Most of rsync's users are not programmers, and even us few 
who are apparently still get confused with rsync's include/exclude 
logic, forget about even more complicated approaches.

> IMHO, rsync already has too much of its own "filtering" functionality,
> and needs less, not more.  But maybe a hook like this that lets users
> interface with their own filtering program is a step toward
> deprecating rsync's [in|ex]clude[-from] options.
> 
> Notice that a generic include and exclude hooks immediately obsoletes
> the --*-from options and the --*=PATTERN options.  (rsync needs fewer
> options, ya see? :)

I totally agree with you.  Having now read the description of the 
--filter option in CVS's manpage (duh!) I think what wayne is working on 
is right on the money and will satisfy 95% of rsync's power users (most 
of rsync's regular users needs are already met by the current 
include/exclude rules).

>>Wayne Davison wrote:
>>
>>
>>>It already supports per-directory name rules, both inherited and not.
>>>The idea of having per-directory size and time limits would not be hard
>>>to add, and may be quite worthwhile.  For instance, assume 's' is for
>>>size and 't' is for the modified time:
>>>
>>>   # Don't transfer files 1 GB or larger
>>>   s< 1g
>>>   # Don't transfer files 100 KB or smaller
>>>   s> 100k
>>>   # Only transfer new files (modified in the last day)
>>>   t> yesterday
>>>
>>>Something like that, perhaps.
> 
> 
> We don't really want to reinvent 'find', do we?

Well, no, that's why I was advocating adopting its syntax and reusing 
its code so that rsync can do similar operations on a per-directory 
basis, but as I said maybe this is already overkill.  I am against a 
solution that would execute the find as an external program for each 
file considered for performance reasons, though.  If you really need 
complete freedom maybe the way to go is to do your file selection first 
and use --files-from.  The reason to implement a good --filter option is 
because it sits in a sweet spot between the --include/exclude and the 
--files-from scenarios.  It still lets rsync do all the work of figuring 
out the file list with just a little effort from the user.  The real 
challenge is making this powerful without making it too complicated, 
because in that case nobody will use it.

-- Alberto