Suggested chnage to "--partial" usage.

Wallace Matthews wmatthews at sepaton.com
Thu Jun 17 12:34:05 GMT 2004


I am working with individual files that can be as large as 100 Gig and I exclusively use the "push" model. When there is a broken pipe (usually a time out or a temporary network problem) it would be nice if the local end could attempt to reopen the pipe and resume building the file. I know this involves some special work to check point the .<file> at the remote end, remember the .<file>, the checksums, and the checkpoint. Then on a "restart" you could ask the remote end to truncate the .<file> back to the last checkpoint (this might take a bit of magic) and restart the local processing with the saved checksums at the checkpoint. 

This only makes sense for large files. I wouldnt bother for files under a threshold of say 100 Megabytes. 

If you asked me for a prioritized list of things I need from rsync for my application, this would be numero uno. 

For test cases over the corporate intra-net (GigE) I seldom see broken pipes unless my remote system fills all the disk space. But, when I run tests between my system at home and my system at work, they happen frequently. I am pretty sure that the glitches are caused by comcast in the last hop to the house. But, it is representative of what you have to expect for use of the public network.

FYI, my numero dos priority is more definitive reporting of why a pipe broke. If it is actually a full disk or a system problem at the remote end, I dont want an automagic restart. If it is a network related issue, I do want an automagic restart. 

Since this will be an unattended operation run by monkeys, the last thing you want is to have the monkeys decide is when/what to restart.

wally

-----Original Message-----
From: Chris Shoemaker [mailto:c.shoemaker at cox.net]
Sent: Wednesday, June 16, 2004 1:38 PM
To: Jason Potter
Cc: rsync at lists.samba.org
Subject: Re: Suggested chnage to "--partial" usage.


On Wed, Jun 16, 2004 at 08:30:16PM +0800, Jason Potter wrote:
> Hi There,
> 
>  
> 
> This post is brought about due to the following two:
> 
> http://www.mail-archive.com/rsync@lists.samba.org/msg10702.html
> 
> http://www.mail-archive.com/rsync@lists.samba.org/msg10709.html
> 
>  
> 
> I have a situation where I need to upload large files over an unstable link
> (resuming is a requirement) and only when they are complete can they be
> renamed and hence replace the original file.  The process is an automated
> one so I don't have the ability to just walk over to the machine and see if
> it is finished by manually comparing the file size at the source and
> destination.
> 
>  
> 
> I have looked at the source code and have an alternative suggestion to the
> way -partial works.  I am looking for comment on if people think this is a
> valid suggestion or am I missing something.
> 
>  
> 
> 1)       In cleanup.c the function _exit_cleanup will call finish_transfer
> regardless of the code the function receives as a parameter. (see errcode.h
> for the codes it could receive)
> 
> 2)       Suggest a new function is written to adjust the name of the file
> the temp file is written to, to be the correct name plus a know extension.
> This achieves the requirement that the original file does not get over
> written if the file is not complete.
> 
> 3)       If the above was implemented you would then just have to adjust the
> scyncing code to check for these temporary files and use these to resume the
> connections.
> 
> 4)       When the file is complete _exit_cleanup gets a code value of 0
> (zero) and this can be used to call the currently written finish_transfer.
> 
>  
> 
> So what do you all think, will it work. 
> 

I can see the usefulness of such a feature, but ...

what if the portion of the source or destination file that was
already transferred (and which was stored in the temporary file) is modified
between subsequent attempts?  (I don't mean the temp file changes, I mean
the actual src or dest.)  ISTM, you'd have to at least _check_ for this
condition.  That means rechecksumming the source file on the sender from the
beginning.  On the receiver side, it means checksumming both the
(possibly modified) destination file and the previously saved temp file.
Then you have 3-way compare:
   sumA = checksum from source file
   sumB = checksum from possibly modified dest
   sumT = checksum from previously saved temp

   if sumA == sumB, do nothing.
   if sumA != sumB && sumA == sumT,
      then no retransmit needed, use block from tempfile
   if sumA != sumB && sumA != sumT, 
      then retransmit anyway, throw away temp block.


Interesting.  I haven't delved into the core rsync algorithm enough to
say for sure that this is possible, but I don't know that it's
_im_possible.  :-)

-chris




 
>  
> 
> I look forward to your responses.
> 
>  
> 
> Cheers
> 
> Jason
> 
>  
> 
> -- 
> To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


More information about the rsync mailing list