Rsync: Re: patch to enable faster mirroring of large filesystems
Lenny Foner
foner-rsync at media.mit.edu
Wed Nov 28 09:00:14 EST 2001
Date: Tue, 27 Nov 2001 15:34:15 -0600
From: Dave Dykstra <dwd at bell-labs.com>
[ . . . ]
No, the difficulty of turning on the optimization is irrelevant because the
optimization is no longer in the current version of rsync. It is only
needed to do the performance test which is a one-time thing.
Aha! I stand corrected. Thank you.
You seem to be missing my point. I agree that --files-from is useful even
if it has no impact or even negative impact on performance. Nevertheless,
I want to know what the impact on performance will be compared to using an
explicit include-from list, and I am bartering my volunteer effort of
developing the code for someone else's volunteer effort of doing
performance tests of the old optimized case which I expect to be
practically identical to the performance of --files-from. I personally
don't need --files-from because the --include-from list is working fine for
me, so I need extra motivation to put some time into it. I think it has to
be done much like that optimization was done and since I wrote the
optimization in the first place I expect it will probably be more efficient
for me to do it than it would be for somebody else to do it; otherwise I'd
probably just say forget it and wait for somebody else to write the code.
Ah. That wasn't clear to me until now, and might not have been clear
to others; my impression was that it would be deemed a bad idea to
supply --files-from -unless- it could be shown to be as efficient as
the original system---no matter who supplied the patch. I thought
that this was bottlenecking any possibility of getting such a patch
into the official release tree. Thanks for making this clear.
[ . . . ]
I'm pretty sure that rsync won't use up memory for excluded files so it
would make no difference.
...though this also implies (since you say it'd probably use basically
the same mechanism internally) that it -would- nonetheless keep info
around for the entire run about each file that -was- going to be/had been
transferred, yes? This is a separate problem from how the files are
selected, but I've lost track of what the right solution here should
be, except for dropping each directory's info after you leave it---
which would presumably not necessarily be easy if you're getting the
file list in arbitrary order via --files-from, but might be easier
if they were being generated via rsync's current traversal algorithm.
In any event, I -hope- that the memory issue is cleanly separable
from the issue of how files get selected; this might be a good time
to at least ponder the issue, if --files-from might soon exist.
Thanks again.
More information about the rsync
mailing list