rsync variation help

Tue Apr 22 18:36:18 EST 2003

Your message is not standards conformant.  Lines should not
exceed 78 chars.
On Tue, Apr 22, 2003 at 01:11:43AM -0700, Vikas Yadav wrote:
> 
> This message is for rsync developers who know the code
> very well. This is first time I am using rsync. In short
> trying to solve an problem by using existing rsync code. 
> 
> rsync does treewalk and it also does checksum on "logical
> file block", whose size can be choosen at will. I want to
> use this exisiting framework to write a utility to
> maintain a checksum tree of an dataset.

Look into librsync.

> 
> Q. What do I mean by checksum tree.  A. By that I mean to
> create a tree having same structure as source dataset has.
> But all the files 

In other words you want rsync to cache the blocksums.

>    on destionation will contain 128 bit checksum instead
>    of X bytes of actual block.  Perferbly X will be 4096
>    bytes. 

Broken.  It has already been proven that a fixed block size
is a bad idea.  For this reason rsync has dynamic per-file
block sizes.  Your cache will need to be aware of this.

>    Destination tree will match the source side in
>    structure. Only thing will be different is size of
>    files and data within the files. Destination contains
>    only the checksum of data.
> 
>    I am assuming that such a task should be easy using the
>    existing code of rsync. Lets call this new tool rsync*
>    for a moment.
> 
> Q. You have a rsync relationship between a dataset and it
> checksum tree. What should happen when I change source
> dataset and run rsync* to modify the checksum tree on
> destination.  A. Instead of calculating md{5 or 4}
> checksum on destination file, rsync* will read the
> checksum store in checksum tree and send that over to
> source. Source should not send changed blocks to
> destination. On received changed blocks from source,
> destination should update the checksum tree with new
> checksum on corresponding blocks.

Blocks may be rearranged and new data inserted.  The entire
blocksum array will have to be recalculated.  And it needs
to be kept in sync with the file, even if multiple writers.

> I think I have given a overview of what I want to achieve.
> Can someone help me on this list by telling me - If such a
> thing can be easily achieved with current rsync code with
> little effort ? 

It cannot.

> - If yes, where should I start looking
> at. Where to find the code which copy write blocks on
> destination ? Where to find the code which calculates
> checksum to be sent to source. 
> 
> Thanx in advance for your help, Vikas

My second best recommendation would be to store the blocksum
cache as an extended attribute of the file itself.  Require
a filesystem without arbitrary limitations on EAs.

My best recommendation is to consider the value of the whole
idea.  It only benefits where you have a rsync daemon
receiving files over and over where only a small fraction
actually change.  The sender gains nothing from it.

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw at pegasys.ws

		Remember Cernan and Schmitt