need to modify file data before storing it on destination

jw schultz jw at pegasys.ws
Tue Apr 1 09:22:13 EST 2003


On Mon, Mar 31, 2003 at 01:16:50PM -0800, Kyle Jones wrote:
> jw schultz writes:
>  > On Mon, Mar 31, 2003 at 12:37:27PM -0800, Kyle Jones wrote:
>  > > I'd like to be able to store remote files compressed or encrypted
>  > > or both.  I think this could be supported in a general way by
>  > > having:
>  > > 
>  > >     1. an rsync option --remotefilter=command that specifies a
>  > >        remote command that rsync pushes file data through before
>  > >        storing it on disk.  This option would imply --whole-file.
>  > >     2. an rsync option --times-only so that rsync would consider
>  > >        only the modification time of the remote file when deciding
>  > >        whether to update it.  This is needed so that rsync would
>  > >        ignore the file size differences of compressed remote files.
>  > 
>  > Well, with that you have just disabled the primary
>  > characteristic of rsync.  I'm not saying that a utility that
>  > identifies files that have changed for a compressed remote
>  > storage, just that this isn't rsync.
> 
> But isn't that also the case if --whole-file is used, an option that
> rsync already has?  The chief advantage of rsync over rdist is
> the checksum comparison code, which --whole-file disables.

Yes, sort-of.  Rdist is also painfully slow because it
doesn't keep the pipeline full.  The whole-file option was
added primarily for non-network transfers which now use it
automatically.  It also is helpful when syncing across fast
networks.

I'm sorry if i cited only one factor.  The other thing is
that rsync is about synchronizing so that both ends are the
same.  What you are talking about doesn't do that.  We seem
to get someone proposing some variant about every six weeks.
So far they have each had various issues.   Your proposal is
to have it write-only, store compressed files using the same
name and disable the rsync algorithm.

What strikes me about most of the proposals is that they
really want what would be best implemented as a rsync-like
daemon, compatable with rsync but storing the files/trees in
some sort of managed repository that could keep track of
original file-sizes and whether the files were already
compressed.  On the positive side, Once you start dealing
with the out-of-band data you even have the possibility of
caching block checksums for write-often, read-seldom
repositories and potentially improving performance by
storing the file meta-data in a more efficient way to
retrieve than stat() reducing filesystem load.

It is the out-of-band data that is the real killer.
Traditionally it would need to either be stored outside of
the file repository or in files that somehow are guaranteed
not to conflict and not overflow filename length
limitations.  The addition of extended attributes to
filesystems opens up the possibility of avoiding the worst
of these limitations at the cost of restricting
implementations to filesystems with compatible extended
attribute support.

As i hope you can see i am not exactly oposed the idea of
compressed repositories with an rsync interface.  I just
don't think it belongs in the rsyncd utility.



-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw at pegasys.ws

		Remember Cernan and Schmitt


More information about the rsync mailing list