Feature request, or HowTo? State-full resume rsync transfer

Mon Jul 11 14:57:06 MDT 2011

Hi once more,

On Mon, 11 Jul 2011, Eberhard Moenkeberg wrote:
> On Mon, 11 Jul 2011, Donald Pearson wrote:

>> I am looking to do state-full resume of rsync transfers.
>> 
>> My network environment is is an unreliable and slow satellite
>> infrastructure, and the files I need to send are approaching 10 gigs in
>> size.  In this network environment often times links cannot be maintained
>> for more than a few minutes at a time.  In this environment, bandwidth is 
>> at
>> a premium, which is why rsync was chosen as ideal for the job.
>> 
>> The problem that I am encountering while using rsync in these conditions is
>> that the connection between the client and server will drop due to network
>> instability before rsync can transfer the entire file.
>> 
>> Upon retries, rsync starts from the beginning.  Re-checking data that has
>> already been sent, as well as re-building the checksum in it's entirety
>> every time.  Eventually I reach an impasse where the frequency of link loss
>> prevents rsync from ever getting any new data to the destination.
>> 
>> I've been reading through the various switches in the man pages to try to
>> find a combination that will work.  My thinking was to use a combination of
>> --partial and --append.  With the first attempt using the --partial switch,
>> and subsequent attempts using both --partial and --append.  The idea being
>> rsync would build a new "partial" file, and be able to resume building that
>> file while making the assumption upon subsequent retries that the existing
>> partial file, however large it may be, was assembled correctly and does not
>> need to be checked.
>> 
>> However in practice rsync does not work in this way.  I did not find any
>> other switches or methods that would enable rsync to literally pick up 
>> where
>> it left off, without destroying the original destination file, so that it's
>> blocks can be used to minimize transferred data and not need to always 
>> start
>> from block #1.  Such that the aggregate of multiple rsync attempts are able
>> to complete the transfer as a whole while still maintaining the minimum
>> amount of data "on the wire" as if the file was sent in a single rsync
>> session.
>> 
>> If this is possible with rsync's current feature set I would be very
>> appreciative of someones time to reply with an example.
>> 
>> Or if this is not currently possible, an idea that comes to mind and
>> ultimately a feature request would be to have a switch that tells rsync 
>> upon
>> session drop, to do a memory dump of its checksum list, and the last
>> completed block worked on, to a provided file name specified by the switch.
>> This way, with a 2nd switch, rsync can be executed again and will reference
>> this memory dump file, instead of rebuilding a new checksum list, and use
>> that to pick up where it left off or "restore previous state", instead of
>> starting over from block #1.
>
> In my experience, re-checking the already received "partial" blocks takes 
> about 3 minutes for a 4 GB partial file.

I forgot to say: over a 56 kbit modem line.

Viele Gruesse
Eberhard Moenkeberg (emoenke at gwdg.de, em at kki.org)

-- 
Eberhard Moenkeberg
Arbeitsgruppe IT-Infrastruktur
E-Mail: emoenke at gwdg.de      Tel.: +49 (0)551 201-1551
-------------------------------------------------------------------------
Gesellschaft fuer wissenschaftliche Datenverarbeitung mbH Goettingen (GWDG)
Am Fassberg 11, 37077 Goettingen
URL:    http://www.gwdg.de             E-Mail: gwdg at gwdg.de
Tel.:   +49 (0)551 201-1510            Fax:    +49 (0)551 201-2150
Geschaeftsfuehrer:         Prof. Dr. Oswald Haan und Dr. Paul Suren
Aufsichtsratsvorsitzender: Prof. Dr. Christian Griesinger
Sitz der Gesellschaft:     Goettingen
Registergericht:           Goettingen  Handelsregister-Nr. B 598
-------------------------------------------------------------------------