superlifter design notes and a new proposal

Martin Pool mbp at toey.home
Mon Aug 5 02:10:01 EST 2002

On  4 Aug 2002, Wayne Davison <wayned at> wrote:
> On Sun, 4 Aug 2002, Martin Pool wrote:
> > My first draft was proposing what you might call a "fine-grained" rpc
> > system, with operations like "list this directory", "delete this
> > file", "calculate the checksum of this file."  I think Wayne's rzync
> > system was kind of like that too.
> Your previous proposal sounded quite a bit more fine-grained than what
> rZync is doing.  For instance, it sounded like you would have much more
> primitive building-block messages and move much of the controlling
> smarts into something like a python-language scripting layer.  While
> rZync allows ftp-level control (such as "send this file", "send this
> directory tree", "delete this file", "create this directory") it does
> this with a small number of higher-level command messages.
> Rsync, as you know, is a much more modal protocol.  It has a strict set
> of steps that must be specified in order and nothing else.  This saves
> bytes because so much of the protocol is determined by context, but is
> very limiting.

I think there are actually two problems with it:

 - It's very limiting

 - When you *do* introduce new features, they are very disruptive to
   the code.  At the moment, there is really just one big convoluted
   code path through rsync, and adding options adds more twists and 
   branches.  It would be far preferable, to my mind, if you could 
   add new opcodes or whatever.

I also had a look at the byte-level documentation for rZync again.
Can I persuade you to switch to a more XDR-like format?


  uint32 opcode
  uint32 file_seq
  uint32 data_length
  byte   data[]

  Message # (or perhaps "opcode" is a clearer name, or "command"?)  is
  four bytes, being a 4-char ascii mnemonic.

  Data length and file sequence is always four bytes.

  Data is padded with 0-3 nul bytes to bring it to a four-byte boundary.

I think this gives us several wins:

 - Giving the opcode/message # as something with "more entropy" means
   it will basically never occur "accidentally" in the stream, unlike
   a byte like 0 or 2.  This is essentially a sanity check at the
   start of every packet.

 - Later on, if we decide that packets larger than 32kB would make
   sense, we can do it without changing the protocol.

 - Packing/unpacking these things will use much less CPU; and the code
   will be more self-evidently correct. 

 - It's easier to read in a hex dump.

 - Using lzo or gzip will squish them to a similar size, so
   micro-optimization is not necessary.


More information about the rsync mailing list