Feature request, or HowTo? State-full resume rsync transfer

Leen Besselink leen at consolejunky.net
Mon Jul 11 16:28:41 MDT 2011


On 07/11/2011 10:57 PM, Eberhard Moenkeberg wrote:
> Hi once more,
>
> On Mon, 11 Jul 2011, Eberhard Moenkeberg wrote:
>> On Mon, 11 Jul 2011, Donald Pearson wrote:
>
>>> I am looking to do state-full resume of rsync transfers.
>>>
>>> My network environment is is an unreliable and slow satellite
>>> infrastructure, and the files I need to send are approaching 10 gigs in
>>> size.  In this network environment often times links cannot be
>>> maintained
>>> for more than a few minutes at a time.  In this environment,
>>> bandwidth is at
>>> a premium, which is why rsync was chosen as ideal for the job.
>>>
>>> The problem that I am encountering while using rsync in these
>>> conditions is
>>> that the connection between the client and server will drop due to
>>> network
>>> instability before rsync can transfer the entire file.
>>>
>>> Upon retries, rsync starts from the beginning.  Re-checking data
>>> that has
>>> already been sent, as well as re-building the checksum in it's entirety
>>> every time.  Eventually I reach an impasse where the frequency of
>>> link loss
>>> prevents rsync from ever getting any new data to the destination.
>>>
>>> I've been reading through the various switches in the man pages to
>>> try to
>>> find a combination that will work.  My thinking was to use a
>>> combination of
>>> --partial and --append.  With the first attempt using the --partial
>>> switch,
>>> and subsequent attempts using both --partial and --append.  The idea
>>> being
>>> rsync would build a new "partial" file, and be able to resume
>>> building that
>>> file while making the assumption upon subsequent retries that the
>>> existing
>>> partial file, however large it may be, was assembled correctly and
>>> does not
>>> need to be checked.
>>>
>>> However in practice rsync does not work in this way.  I did not find
>>> any
>>> other switches or methods that would enable rsync to literally pick
>>> up where
>>> it left off, without destroying the original destination file, so
>>> that it's
>>> blocks can be used to minimize transferred data and not need to
>>> always start
>>> from block #1.  Such that the aggregate of multiple rsync attempts
>>> are able
>>> to complete the transfer as a whole while still maintaining the minimum
>>> amount of data "on the wire" as if the file was sent in a single rsync
>>> session.
>>>
>>> If this is possible with rsync's current feature set I would be very
>>> appreciative of someones time to reply with an example.
>>>
>>> Or if this is not currently possible, an idea that comes to mind and
>>> ultimately a feature request would be to have a switch that tells
>>> rsync upon
>>> session drop, to do a memory dump of its checksum list, and the last
>>> completed block worked on, to a provided file name specified by the
>>> switch.
>>> This way, with a 2nd switch, rsync can be executed again and will
>>> reference
>>> this memory dump file, instead of rebuilding a new checksum list,
>>> and use
>>> that to pick up where it left off or "restore previous state",
>>> instead of
>>> starting over from block #1.
>>
>> In my experience, re-checking the already received "partial" blocks
>> takes about 3 minutes for a 4 GB partial file.
>
> I forgot to say: over a 56 kbit modem line.
>
>
> Viele Gruesse
> Eberhard Moenkeberg (emoenke at gwdg.de, em at kki.org)
>

Hello,

rsync tries to use as little bandwidth as possible, which means it will
check if part of the file already exists at the other side, even if
something was added in the middle of the file or start.

So it is normal for rsync to do checksums of file parts so it knows that
parts of the file haven't changed. With really big files it can take a
long time.

If you regularly loose the connection then this might be a problem for
you as it needs to restart.

I haven't checked this, but looking at the manual, maybe you can use
--append to disable the checksums when doing you have --partial enabled ?

But you can probably only do that if you know file content doesn't change.

Hope this was helpful.

Maybe someone else on the mailinglist who knows better than me can
confirm or deny my ideas.

Have a nice day,
    Leen.



More information about the rsync mailing list