Feature request, or HowTo? State-full resume rsync transfer

Donald Pearson donaldwhpearson at gmail.com
Tue Jul 12 09:10:48 MDT 2011


@Eberhard:  I understand what you're trying to say, but in this environment
the reality is rsync reaches and impasse where it is unable to get beyond
work that has already been completed before link failure cuts it off again.

@Leen:  A combination of --append with --partial is what I tried, however
--partial appears to be ignored with the --append switch is given.  A
partial file is not generated.
--inplace leaves me with the same problem.  Rsync will still start from the
beginning every time, which is not a behavior this network environment can
support.  --append unfortunately implies --inplace, which I'm sure is the
reason it is incompatible with --partial.

@Matthias:  --timeout actually causes more problems than it solves.  Without
--timeout set, the server (target-side) rsync process continues to run,
apparently indefinitely, waiting for the client (source-side) to talk to it
again.  This behavior would be excellent, except that it seems that it will
only talk to the original rsync client.  If the original rsync client
process ends, and a new one is started, the original rsync server process
does not respond to it, a new rsync server process is created to speak with
the new rsync client process, and the whole transaction is unique from the
original transaction in every way.   This means if --partial is an included
switch, you will have 2 partial files created.  The original partial file
will remain a partial file indefinitely.  The 2nd process will not pick up
where the 1st process ended, it simply starts all over again from the
beginning.

When --timeout is set, both the server and client processes will wait the
specified number of seconds, and then give up.  When --partial is used, the
server will over-write the destination file with the partial file.   This is
absolutely detrimental to sending only the differences.

Here is a file list of the server side directory during and after the
command 'rsync -hvzB=512 --partial --progress --timeout=30
/images/test/AllCafeFinal_2011-06-28.temp 192.168.0.27:
/images/test/AllCafeFinal_2011-06-28.temp

A few moments after rsync has begun, the network cable between the two hosts
is removed.

Before:
analyst at ubuntudesktop1010:/images/test$ ls -lah
total 15G
drwxr-xr-x 2 analyst analyst  64K 2011-07-12 10:35 .
drwxr-xr-x 3 analyst root     48K 2011-07-07 16:05 ..
-rw-r--r-- 1 analyst analyst 6.1G 2011-07-12 10:13
AllCafeFinal_2011-06-28.temp
-rw-r--r-- 1 analyst analyst 3.3G 2011-07-11 14:00
AllCafeFinal_2011-06-28.temp.gz
-rw-r--r-- 1 analyst analyst 3.3G 2011-04-01 13:59
AllCafeFinal_2011-06-28.temp.gz.bak
-rw------- 1 analyst analyst 582M 2011-07-12 10:36
.AllCafeFinal_2011-06-28.temp.S6fwyP
-rw-r--r-- 1 root    root    1.1G 2011-07-11 11:23 rsync.pcap

As you can see the destination file is 6.1 Gigs in size.  The partial file
has grown to 582 Megabytes.
Now the network cable is pulled, and after 30 seconds both rsync client and
servers time out.  By the way the source file is 7.5 Gigs in this scenario.

After:
analyst at ubuntudesktop1010:/images/test$ ls -lah
total 8.3G
drwxr-xr-x 2 analyst analyst  64K 2011-07-12 10:37 .
drwxr-xr-x 3 analyst root     48K 2011-07-07 16:05 ..
-rw-r--r-- 1 analyst analyst 583M 2011-07-12 10:37
AllCafeFinal_2011-06-28.temp
-rw-r--r-- 1 analyst analyst 3.3G 2011-07-11 14:00
AllCafeFinal_2011-06-28.temp.gz
-rw-r--r-- 1 analyst analyst 3.3G 2011-04-01 13:59
AllCafeFinal_2011-06-28.temp.gz.bak
-rw-r--r-- 1 root    root    1.1G 2011-07-11 11:23 rsync.pcap
analyst at ubuntudesktop1010:/images/test$

Now you can see the original 6.1 Gig destination file is gone.  It has been
replaced with The 583 Megabyte partial file.  If I run rsync again, rsync is
going to have to send the entire 7 gigs remainder.  It can no longer only
send the difference between the source and destination, because the
destination has been destroyed.

--inplace leaves me with the issue of needing to re-check data that has
already been sent, every time.  I need to avoid that.   --append implies
--inplace and so inherits it's issues so that also isn't an option.

As far as I can tell, rsync is unable to resume a failed transfer.  At best
rsync starts a fresh transfer over again, but by design doesn't need to
re-send what it already sent.  Of course I am very much open to being wrong
about this statement and I appreciate everyone who has taken the time to
consider my issue and offer suggestions.

3 potential ways for a true resume to be accomplished come to mind.
 - keep the indefinite timeout of rsync server, but enable it to respond to
new client connections so it can continue to write to the same partial file
 - add a feature that allows rsync to memory dump on timeout, so that a new
rsync process can use that memory dump data to truly pick up where the
previous process stopped.
 - modify --append such that --append no longer implies --inplace, in a
manner that is also compatible with --partial.  Right now the observed
behavior of --append is that the destination file size is looked at.  Rsync
then skips that much data from the front of the source file.  If --append is
given with --partial, the existence of a partial file will want to be
verified, and then the size of that partial file will want to be looked at,
not the destination file.  The rsync client can then skip that much data (as
much as already exists in the partial file) of the source file, and append
to the partial file, not the destination file directly.   This also means
the partial file cannot over-write the destination file until it is a true
copy of the source.

Unfortunately I am incapable of enabling any of these capabilities myself.

Regards,
Donald



On Tue, Jul 12, 2011 at 3:52 AM, Matthias Schniedermeyer <ms at citd.de> wrote:

> On 11.07.2011 16:01, Donald Pearson wrote:
> > I am looking to do state-full resume of rsync transfers.
> >
> > My network environment is is an unreliable and slow satellite
> > infrastructure, and the files I need to send are approaching 10 gigs in
> > size.  In this network environment often times links cannot be maintained
> > for more than a few minutes at a time.  In this environment, bandwidth is
> at
> > a premium, which is why rsync was chosen as ideal for the job.
> >
> > The problem that I am encountering while using rsync in these conditions
> is
> > that the connection between the client and server will drop due to
> network
> > instability before rsync can transfer the entire file.
> >
> > Upon retries, rsync starts from the beginning.  Re-checking data that has
> > already been sent, as well as re-building the checksum in it's entirety
> > every time.  Eventually I reach an impasse where the frequency of link
> loss
> > prevents rsync from ever getting any new data to the destination.
> >
> > I've been reading through the various switches in the man pages to try to
> > find a combination that will work.  My thinking was to use a combination
> of
> > --partial and --append.  With the first attempt using the --partial
> switch,
> > and subsequent attempts using both --partial and --append.  The idea
> being
> > rsync would build a new "partial" file, and be able to resume building
> that
> > file while making the assumption upon subsequent retries that the
> existing
> > partial file, however large it may be, was assembled correctly and does
> not
> > need to be checked.
> >
> > However in practice rsync does not work in this way.
>
> I think you didn't wait for the target rsync to complete, if a
> connection breaks, you have 2 parts left hanging. The less visible
> target-side is the important one here. That rsync has to "complete"
> before you do another try. Depending on how your connection drops it MAY
> hang for some time. I don't remember if rsync does "the right thing"
> if you just kill it, or if you have to wait for it. In the latter case
> "--timeout" sounds like it can be used to expedite matters.
>
> And also --inplace, with or without --append, reads like it is what you
> want, if you can live with it's caveats.
>
>
>
>
>
>
> Bis denn
>
> --
> Real Programmers consider "what you see is what you get" to be just as
> bad a concept in Text Editors as it is in women. No, the Real Programmer
> wants a "you asked for it, you got it" text editor -- complicated,
> cryptic, powerful, unforgiving, dangerous.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.samba.org/pipermail/rsync/attachments/20110712/04d438e2/attachment.html>


More information about the rsync mailing list