Maximum file size with rsync?

Wed Aug 13 07:53:42 EST 2003

On Tue, Aug 12, 2003 at 11:27:28AM -0700, Jeff Frost wrote:
> On Tue, 12 Aug 2003, Steve Bonds wrote:
> 
> > On Tue, 12 Aug 2003, Jeff Frost J.Frost-at-fantastic.com |Rsync List| wrote:
> > 
> > > I've been trying to find mention of rsync's maximum transferrable file
> > > size, but haven't been able to.  I'm specifically curious if it would
> > > be capable to send a file in the neighborhood of 200gb with rsync
> > > assuming the filesystems on both end could handle it?
> > 
> > Before embarking on this, check the archvies for a discussion of how the
> > rsync algorithm breaks down on files with over a few hundred megabytes of
> > changes, and how you can help correct for this by adjusting the block
> > size.
> > 
> > I don't think rsync would work well for files that large if they contained
> > lots of changes.
> 
> I was thinking to use the whole-file option instead of letting it attempt to
> chew through the files for diffs, as I had problems with rsync dealing with 
> files in the 12-15gb range in the past, though it worked great when using 
> --whole-file.  Would there be a more efficient program to use for just moving 
> the bits over there?

If you had checked the archives you would know that changes
have been made in CVS (not yet released) to deal with this
issue.  How well this will work on a 200BG file is so far
only a matter of theory but it would be most helpful if you
could try and let us know.

If you do use 2.5.6 or earlier i don't think --whole-file is
going to speed things up unless the file is mostly changed.
Increasing the --block-size will help although 2.5.6 does
this automatically to a limited degree.  There is a table in
the list archives (not the ones from me) that indicates the
right block sizes to use for large files.

If you are transferring only one file and intend to use
whole-file i wouldn't use rsync.  scp would be faster, as
would gzip -c finame | ssh remote "gzip -d >finame".

Reading the archives may actually confuse the issue because
there are misstatements there.  What happens with the large
files is that there are checksum collisions due to the
number of blocks in the file.  This results in the some
mismatched blocks being used.  When the transfer (phase 1)
is complete the whole-file checksum detects the error and
the transfer is repeated (phase 2) using full checksums, it
does not fall back to whole-file.   Phase 1 is done using 6
bytes of checksum per block and phase 2 uses 20 bytes.  Both
phases use the same block size.  Given that each pass means
2+ file access on the receiver the network speed threshold
for --whole-file beating the rsync algorithm is lower if it
takes 2 passes.  The change in CVS affects block_size and
the phase 1 checksum size so that the likelihood of checksum
collision becomes infinitesimal.

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw at pegasys.ws

		Remember Cernan and Schmitt