rsync with large gzip files.

jw schultz jw at pegasys.ws
Sun Oct 27 17:32:00 EST 2002


On Sun, Oct 27, 2002 at 06:36:49PM +0800, Leaw, Chern Jian wrote:
> Hi,
> 
> I tried performing a complete copy of 17GB of filesystems over the WAN
> (0.8GB/hr) with the speed of 16Mbps. The filesystem consists of several
> large g-zipped files. These large g-zipped files have actually been zipped
> out of other sub-filesystems and directories. I noticed that while
> transferring a lists of large g-zipped files, rsync tends to take a much
> longer time to transfer those files and at times, it even hangs.  

You don't say how large.  Are these files larger than 512MB or 2GB?

> I'm invoked rsync to transfer these large file-systems from Host A to Host B
> as follows: 
> >From Host A:
> # rsync -avz  /fs10/archives/archive.gz hostB:/fs10/archives  

They are already compressed.  drop the -z.

> In the case of transferring those large g-zipped files, it hangs in the
> midst of the transfer or even at the end of the transfer, giving no
> indication whatsoever if rsync had successfully copied those files across. 
> 
> The rsync version which I running is 
> # /usr/bin/rsync --version 
> rsync version  2.4.6 protocol version 24

Upgrade.  2.4.6 is ancient.  There have been many bugs fixed
since then.  Use 2.5.5 or CVS.  Be sure you build with
large-file support.

> I'm running the mentioned version of rsync on the following platforms: 
> Linux RPM 7.1 
> HP-UX 11.00 (OS version level: E, OS release level B.11.00)  
> IBM AIX (release 3, OS version 4).

I recall HP-UX having large-file problems you might want to
search the archives but i think that it just requires
explicityly telling configure to enable it.

Just be aware that AIX has mount option and filesystem
creation issues for large file support in the filesystem.

> Those g-zipped files were compressed using gzip version 1.2.4(18 Aug 93),
> and with the following method:
> # cat dir1 dir2 dir3 ... | gzip > archives.gz

You must have a version of cat i've never heard of.

> I was wondering if there is a work-around to overcome this problem which I'm
> facing? Are there any switches in rsync or scripts which I need to implement
> to solve this glitch?

Below i mention a couple of issues you will want to
consider.  They are unlikely to be related to your current
problem but should be considered with regards to your
approach.

Rsyncing of gzipped files is problematic.  Generic gzip
will effectively defeat the rsync algorythm from the point where
the uncompressed content has its first change.  In the case
of file-tree archives the first change will often be near the
beginning.  There is a patch for gzip that can help by
adding an option specifically for rsync.
http://rsync.samba.org/ftp/unpacked/rsync/patches/gzip-rsyncable.diff
This does result in a slightly less efficient compression
but does make it possible for rsync to be usefull.

If these filesystems are primarily large (>100MB) files you may
benefit from using a larger rsync blocksize.  There was a
recent discussion of performance issues related to hash
collision on large files on this list.

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw at pegasys.ws

		Remember Cernan and Schmitt



More information about the rsync mailing list