feedback on rsync-HEAD-20050125-1221GMT
Alberto Accomazzi
aaccomazzi at cfa.harvard.edu
Mon Jan 31 16:04:32 GMT 2005
Hi Chris,
Chris Shoemaker wrote:
> On Fri, Jan 28, 2005 at 03:42:25PM -0500, Alberto Accomazzi wrote:
>
>>Chris Shoemaker wrote:
>>
>>
>>>If I understand Wayne's design, it would be possible to invent a
>>>(per-directory) "hook" rule, whose value is executed, and whose stdout
>>>is parsed as a [in|ex]clude file list. E.g.:
>>>
>>>-R "cat .rsync-my-includes"
>>>
>>>or
>>>
>>>-R "find . -ctime 1 -a ! -fstype nfs -a ! -empty -o iname 'foo*'"
>>
>>This is certainly a very powerful mechanism, but it definitely should
>>not be the only way we implement file filtering. Two problems:
>>
>>1. Sprinkling rule files like these across directories would mean
>>executing external programs all the time for each file to be considered.
>
>
> No, only one execution per specified rule. Most users of this feature
> would put specify one rule at the root directory. But, if a user
> wanted to change the rules for every directory, they would have to
> specify a rule in each directory. Then, yes, one execution per
> directory. Presumably they would do this because they actually need
> to. Never one execution per file.
Ok, I guess I had misunderstood your original suggestion. One execution
per directory is presumably not so bad, although it's hard to make
assumptions about how one's data hierarchy is structured.
>> This would presumably slow down rsync's execution by an order of
>>magnitude or so and suck the life out of a system doing a big backup job.
>
>
> If you're referring to process spawning overhead, it's no big deal.
> If you're referring to the actual work required to return the file
> list, what makes you think that rsync can do it more efficiently than
> 'cat' or 'find', or whatever tool the user chose?
I was referring to the overhead of spawning a process per file being
considered. But I think we all agree that this is not desirable nor
necessary.
>>2. Who does actually need such powerful but yet hard-to-handle
>>mechanism? Most of rsync's users are not programmers, and even us few
>>who are apparently still get confused with rsync's include/exclude
>>logic, forget about even more complicated approaches.
>
>
> Do you mean include/exclude mechanism or filtering mechanism? Well,
> IMO, parsing a file list is *less* complicated than rsync's custom
> pattern specification and include/exclude chaining. Actually, I think
> rsync patterns are /crazy/ complicated and fully deserve the pages
> upon pages of documentation, explanation and examples that they get in
> the man page.
>
> But, complexity is somewhat subjective, so I won't argue (much) about
> it. In practice, /familiarity/ is far more important than complexity
> in a case like this. Someone who looks at rsync for the first time
> has a _zero_ chance of having seen something like rsync's patterns
> before, because there is nothing else like them.
I agree that exclude/include patters can be tricky, and you have a good
point about familiarity versus complexity. I think what makes them hard
to handle is the fact that we are dealing with filename (and directory
name) matching and recursion. So matching only a subset of a file tree,
while simple as a concept, is non-trivial once you sit down and realize
that you need a well-defined syntax for it. Can you write a find
expression that is simpler or more familiar to the average user than an
rsync's include/exclude?
> (The allusion to GNU
> tar's --exclude option which takes only a filename, not a pattern,
> isn't really helpful in understanding rsyncs --exclude option.)
Uh? Tar does take patters for exclusion, and has its own quirky way of
dealing with wildcards, directory matching and filename anchoring:
http://www.gnu.org/software/tar/manual/html_node/tar_100.html
> It's not that pattern matching for file selection isn't complex --
> it's just that it's such a well-defined, conceptually simple, common
> task that other tools (like 'find' and 'bash') handle better than
> rsync ever will. And that's the way it should be: it's the unix way.
I agree that this is something we should be striving for as much as
possible: pipeline and offload tasks rather than bloating applications.
>>If you really need
>>complete freedom maybe the way to go is to do your file selection first
>>and use --files-from.
>
>
> Yes, --files-from is nice, and honestly, almost completely sufficient.
> But in some dynamic cases, you can't keep the list updated.
Well, maybe we should go back and see if the solution to all problems
isn't making --files-from sufficient. What exactly is missing from it
right now? The capability to delete files which are not in the
files-from list? Or the remote execution of a command that can generate
the files-from list for an rsync server? Maybe we ought to really
figure out what things cannot be achieved with the current functionality
before coming up with something new.
>>challenge is making this powerful without making it too complicated,
>>because in that case nobody will use it.
>
>
> You see --filter as less complicated than --include/exclude, then?
> It's certainly more powerful.
Since --filter can support a superset of the file selection rules that
--include/exclude supports, it's certainly more complicated than
include/exclude, but not by much: I still think the trickiest part of
the file selection rules for the average user will be pattern matching.
The other big issue looming is the logic used for
nesting/inheriting/overriding file selection rules. I'm really worried
that those can easily get out of hand.
-- Alberto
********************************************************************
Alberto Accomazzi aaccomazzi(at)cfa harvard edu
NASA Astrophysics Data System ads.harvard.edu
Harvard-Smithsonian Center for Astrophysics www.cfa.harvard.edu
60 Garden St, MS 31, Cambridge, MA 02138, USA
********************************************************************
More information about the rsync
mailing list