state of the rsync nation? (revisited 6/2003 from 11/2000)

Jeff Kowalczyk jtk at yahoo.com
Sun Jun 8 00:31:53 EST 2003


I'm interested in these very questions (librsync-rsync relationship,
remaining limitations of rsync, active prospects for ground-up rewrites),
Google searches for rsync info have proved a little too vague due to the
programs ubiquity. Much has certainly changed since this was written,
could some people with knowledge in these areas could update martin's
response for the state of rsync, June 2003? Thanks.

On 13 Nov 2000, Jason Ozolins wrote:
http://lists.samba.org/pipermail/rsync/2000-November/003147.html
> Just a quick question: is the librsync contained within the rproxy
> source code meant to be tracking the development of the mainstream
> rsync, or is it a stripped-down thing meant only to support rproxy?

On 13 Nov 2000, Martin Pool Responded: Here's a quick history:

In the beginning was rsync, which is a file transfer protocol. At the
moment I look after the day-to-day stuff, and tridge watches the
evolution.

rsync gave rise to Josh Macdonald's XDelta, which is optimized for the
case where old and new versions are on the same machine, and so it can
generate more efficient deltas.

tridge extracted the algorithm into librsync, which I renamed to libhsync
when I changed the wire format.  The code currently checked in as librsync
is in my opinion not very good.  It tries to make the algorithm available
at various levels to programs that would like to use it, though the only
user at the moment is rproxy.  rsync doesn't use libhsync -- possibly it
never will, as we care enough about rsync performance that tighter
integration is justified.  Well, if we were starting from scratch it might
be separated out, but it's not worth doing it retrospectively now.

The problems with rsync at the moment are basically:

 * Quirks of design ('triangular' TCP sockets, etc) tend to provoke
   bugs in operating systems or remote shells.

 * Useful features have been added in ad-hoc, and so the code is
   fairly crufty in places.

 * People still want even more features for special cases.  To avoid
   feature hell, my opinion is that we need a clean scripting or plugin
   mechanism.=20

 * rsync is optimized for transferring relatively small trees
   (e.g. the rsync source tree) across slow links (e.g. 56kbps ppp). This
   is fine and important, but people want to use it for different
   situations (10GB, 100Mbps, 50 in parallel) where some design decisions
   (e.g. traverse the whole tree up front) are no longer optimal or even
   adequate.

rproxy uses the rsync algorithm to improve HTTP caching -- it's not
rsync-over-HTTP.  I'm the lead developer for it, and it's in beta.

Completely unrelated to rproxy, sfr has added a small feature to tunnel
rsync through HTTP CONNECT proxies.

Therefore, some people at Linuxcare (primarily rusty, tridge and myself)
are looking at a ground-up rewrite with new code and a new network
protocol.  (Of course we will have a fallback mode.)  This might be called
rsync-3.0, or rsync-tng, or tsync, or something else.

This will likely be a more traditional client-server protocol, somewhat
similar to FTP and HTTP in that the client sends commands to the server to
put or get files.  However, commands will be pipelined,
network-independent binary, and using only a single tcp connection. In
general we hope that there will be less special cases, and probably that
there will be less application-level intelligence in the server and more
in the client.  This should be a firmer foundation for building things
such as

 * implementations in different languages/platforms (Java, Win32
   native, INTERCAL, ...)

 * interactive rsync (like ftp(1))

 * two-way rsync (controlled by the client, which could be automatic
   or even have a GUI.)

 * rsync as a transport for things such as CVS

Discussion about either feature requests or implementation ideas would be
very welcome.  It's probably best to send them to the rsync mailing list.

> The reason I ask is that I am thinking of extending Bob Edwards'
> rsync-based backup server architecture here at DCS, using a database to
> hold file metadata, doing binary deltas for history, and doing block
> compression on backed up data.  This is a fair amount of stuff to
> change, and I was wondering which source base would be better to start
> with.

You might like to look at the XDelta work on XDFS and PCVS, or in the
longer term to work on rsync 3.0.




More information about the rsync mailing list