superlifter design notes and a new proposal
Martin Pool
mbp at toey.home
Mon Aug 5 02:10:01 EST 2002
On 4 Aug 2002, Wayne Davison <wayned at users.sourceforge.net> wrote:
> On Sun, 4 Aug 2002, Martin Pool wrote:
> > My first draft was proposing what you might call a "fine-grained" rpc
> > system, with operations like "list this directory", "delete this
> > file", "calculate the checksum of this file." I think Wayne's rzync
> > system was kind of like that too.
>
> Your previous proposal sounded quite a bit more fine-grained than what
> rZync is doing. For instance, it sounded like you would have much more
> primitive building-block messages and move much of the controlling
> smarts into something like a python-language scripting layer. While
> rZync allows ftp-level control (such as "send this file", "send this
> directory tree", "delete this file", "create this directory") it does
> this with a small number of higher-level command messages.
>
> Rsync, as you know, is a much more modal protocol. It has a strict set
> of steps that must be specified in order and nothing else. This saves
> bytes because so much of the protocol is determined by context, but is
> very limiting.
I think there are actually two problems with it:
- It's very limiting
- When you *do* introduce new features, they are very disruptive to
the code. At the moment, there is really just one big convoluted
code path through rsync, and adding options adds more twists and
branches. It would be far preferable, to my mind, if you could
add new opcodes or whatever.
I also had a look at the byte-level documentation for rZync again.
Can I persuade you to switch to a more XDR-like format?
So:
uint32 opcode
uint32 file_seq
uint32 data_length
byte data[]
Message # (or perhaps "opcode" is a clearer name, or "command"?) is
four bytes, being a 4-char ascii mnemonic.
Data length and file sequence is always four bytes.
Data is padded with 0-3 nul bytes to bring it to a four-byte boundary.
I think this gives us several wins:
- Giving the opcode/message # as something with "more entropy" means
it will basically never occur "accidentally" in the stream, unlike
a byte like 0 or 2. This is essentially a sanity check at the
start of every packet.
- Later on, if we decide that packets larger than 32kB would make
sense, we can do it without changing the protocol.
- Packing/unpacking these things will use much less CPU; and the code
will be more self-evidently correct.
- It's easier to read in a hex dump.
- Using lzo or gzip will squish them to a similar size, so
micro-optimization is not necessary.
--
Martin
More information about the rsync
mailing list