Future RSYNC enhancement/improvement suggestions

Stefan Nehlsen sn at ParlaNet.de
Mon Apr 22 03:18:02 EST 2002


On Fri, Apr 19, 2002 at 12:23:06PM +0200, Jan Rafaj wrote:
> 
> Hello,
> 
> Recently while working with rsync as the way to mirror large (several
> GB) archive on a regular basis, I came across several problems,
> and also got the ideas about their possible solutions
> - please could you investigate & consider implementing the features,
> described below, to future RSYNC releases ?
> 
> - when the checksumming (consider very large archive, several GB)
>   stage of rsync runs slow (~3 and more minutes), which is the
>   case of either slower CPU machines or machines with older HDDs
>   that dont have UDMA or have just UDMA33 transfer modes, one can
>   often observe that the network connection to the master site
>   shuts down and the mirroring fails (in subsequent mirroring
>   attempts, when, f.e., the archive is already transferred from about
>   90%). The reason why I think this happens, is the fact, that the
>   bidirectionally-open connection is just reset by either client
>   or server, becouse rsync does not do any transfer while
>   the checksumming runs (I might be wrong, but this is what
>   I observed), and the tcp connection is reset becouse of stall
>   (I dont have clue by what means, becouse I'm no TCP/IP expert,
>   but I suspect it might be just TCP/IP).
>   How about adding a feature to keep the checksums in a berkeley-style
>   database somewhere on the HDD separately, and with subsequent
>   mirroring attempts, look to it just for the checksums, so that
>   the rsync does not need to do checksumming of whole target
>   (already mirrored) file tree ? I think implementing this could
>   take some time, but it would certainly improve rsync's responsivenes
>   and ease use with slow CPUs & HDDs

The problem is that the generator works in the following steps:

1. for each block both checksums are calculated and stored in a table.

2. the number of entries in the table is send to the sender.

3. the content of the table is send to the sender.

4. the table is thrown away.

There is no real need to do this in 4 steps.

It should be possible to change this without changing the protocol.

 - the number of entries may be calculated from the blocksize and
   the size of the (flat) file. It will be send to the sender.

 - the rest may be done in a loop:

	* read a block
	* calculate checksums for this block and fill a sum_struct
	* send this sum_struct to the generator

The code will become a little more complicated but it will use
less memory and may be a bit faster.


> - perhaps if the network connection between rsync client and server
>   stalls for some reason, implement something like 'tcp keepalive'
>   feature ?

not a good idea -- the line should always be busy


cu, Stefan
-- 
Stefan Nehlsen | ParlaNet Administration | sn at parlanet.de | +49 431 988-1260




More information about the rsync mailing list