Rsync: Re: patch to enable faster mirroring of large filesystems

Lenny Foner foner-rsync at media.mit.edu
Wed Nov 28 09:00:14 EST 2001


    Date: Tue, 27 Nov 2001 15:34:15 -0600
    From: Dave Dykstra <dwd at bell-labs.com>

    [ . . . ]

    No, the difficulty of turning on the optimization is irrelevant because the
    optimization is no longer in the current version of rsync.  It is only
    needed to do the performance test which is a one-time thing.

Aha!  I stand corrected.  Thank you.

    You seem to be missing my point.  I agree that --files-from is useful even
    if it has no impact or even negative impact on performance.  Nevertheless,
    I want to know what the impact on performance will be compared to using an
    explicit include-from list, and I am bartering my volunteer effort of
    developing the code for someone else's volunteer effort of doing
    performance tests of the old optimized case which I expect to be
    practically identical to the performance of --files-from.  I personally
    don't need --files-from because the --include-from list is working fine for
    me, so I need extra motivation to put some time into it.  I think it has to
    be done much like that optimization was done and since I wrote the
    optimization in the first place I expect it will probably be more efficient
    for me to do it than it would be for somebody else to do it; otherwise I'd
    probably just say forget it and wait for somebody else to write the code.

Ah.  That wasn't clear to me until now, and might not have been clear
to others; my impression was that it would be deemed a bad idea to
supply --files-from -unless- it could be shown to be as efficient as
the original system---no matter who supplied the patch.  I thought
that this was bottlenecking any possibility of getting such a patch
into the official release tree.  Thanks for making this clear.

    [ . . . ]

    I'm pretty sure that rsync won't use up memory for excluded files so it
    would make no difference.

...though this also implies (since you say it'd probably use basically
the same mechanism internally) that it -would- nonetheless keep info
around for the entire run about each file that -was- going to be/had been
transferred, yes?  This is a separate problem from how the files are
selected, but I've lost track of what the right solution here should
be, except for dropping each directory's info after you leave it---
which would presumably not necessarily be easy if you're getting the
file list in arbitrary order via --files-from, but might be easier
if they were being generated via rsync's current traversal algorithm.
In any event, I -hope- that the memory issue is cleanly separable
from the issue of how files get selected; this might be a good time
to at least ponder the issue, if --files-from might soon exist.

Thanks again.




More information about the rsync mailing list