superlifter design notes and a new proposal

Mon Aug 5 19:14:02 EST 2002

When I talk about compressing message headers, I have in mind
compressing everything that goes across the socket, after some kind of
initial negotiation phase.  Imagine perhaps a request that says that
everything after that point will be compressed a specified codec (lzo,
gzip, or bzip2.)  

I think this is a desirable way to approach it because it is simple to
implement and understand, and it compresses headers, filenames, and
file data alike.

If compression is done using lzo I think the CPU overhead will be very
small compared to processing done in other parts of the program.  I
have done some very rough experiments on this for distcc.  I guess I
ought to do them in rZync.

Regardless of whether we compress or not, the current header scheme
feels very much like premature optimization, which as Knuth said is
the root of all evil.  (Of course, so is arguing without empirical
data. :-)

I really want to get away from bumming out every single byte as if we
were going over 36k6 modems, and try to get a protocol that will work
well for large data sets and that will evolve well in the future.  I
know you want that too, Wayne, but I think micro-optimization is a
step in the wrong direction.

Fast networks are a significant use case that ought to be considered.
On GigE, having low processing overhead and being able to use things
like sendfile() to transmit data are going to be enormously more
important than the length of the header.

In most transfers, we would hope that message headers are not a very
large fraction of bytes transferred.  It ought to be mostly checksums,
or file data.  It doesn't matter much whether they're three bytes or
twelve bytes if they're only 2% of the total data.  

You can imagine pathological cases where the headers do comprise most
of the data, and so the size of the headers might really make a
difference to the time to transfer.  However, in that case we may pay
a significant CPU cost for the extra complexity of handling
variable-length headers, so it is not clearly better.  In any case,
being able to fill a 32kB pipeline with 3 byte messages may be beyond
us.

There is also this issue of compression causing us not to know how
much readahead we need to get a packet.  This is already a problem
with variable-length headers: we have to read a few bytes, analyze
them, and then possibly read some more.  Therefore, things either have
to go through an internal buffer, or we have to make multiple system
calls.  Neither of these is good for high-speed networks.

With a fixed-length XDR-like header, processing (without compression)
is very simple: read 12 bytes, and then directly read the data out of
them.  If there is body data, we will be in exactly the right place on
the socket to read it directly.

As somebody said earlier in the thread, network-endianness and XDR
were invented for a reason, and a lot of thought went into them.  We
shouldn't ignore it without a good reason.

> I could be persuaded as long as the transferred-byte overhead is kept
> under control.

I'm not proposing hundred-byte headers or anything crazy; I think 12
or 16 bytes per message is "under control".

> The nice thing about the on-the-wire protocol is that it is a small,
> self-contained sub-problem that does not affect the rest of the app.
> It's essentially a black-box in between the send_msg() and
> read_msg() functions, and as long as the arg values make it from one
> side to the other dependably, the rest of the code doesn't care how
> it happens.

That's good.

> The current byte-level code in rZync is rather cryptic, but I wanted to
> keep the byte-overhead near to rsync in my tests so I could see how the
> single-process-per-side design was performing.  Thus, it does go to
> pains to squish the bytes out of the headers wherever possible.

Even if your experiments show the cryptic format is better than
XDR-like, I'd argue we should still go for the more straightforward
format to ease future development.  I strongly feel this is a lesson
we ought to learn from the current program.  Can you really be sure
that in a few years time we will not need more than 64 message types,
or data segments longer than 32kB?

> I'm somewhat cautious about adding a compression layer over the top of
> the message headers.  It should be doable, but it will make things a
> bit more complex.  For instance, my current code reads one message at a
> time because it understands the header format and knows how long the
> data will be from that understanding.

> A zlib deflate/inflate that included the message headers themselves
> will mean that we need to slurp as much data as is available on the
> socket, decompress as much of that as possible (which might not be
> all the currently-read data), and then operate on as many messages
> as we find in the decompressed data.

Yes, that's true.

I guess the alternative you have in mind is just to compress the data
field on particular packets?  Would you start a new compressor for
each one?  Presumably not.

That would also be reasonable, and perhaps better by enabling
fixed-length readahead.  That would work equally well with XDR-like
headers.  I don't have a strong preference at the moment for
compression of data fields or of the whole stream.

> We also have to be sure that we flush the data on the compression
> side often enough that we avoid deadlocking waiting for new message
> headers that aren't coming (i.e. zlib adds a buffering layer of its
> own).

> If we make it flush once per message, that would probably be very
> byte- inefficient for smaller messages.  

> If we instead only force a flush when a pause in the action occurs,
> that would probably be optimal, but might add delays to the transfer
> and would add overall complexity.

I think the situation is similar to running over SSH, where there is
some buffering inside SSH, and in the pipe or socketpair from us to
SSH.

> >  - Packing/unpacking these things will use much less CPU
> 
> I'm not sure that this will be true if the headers are then
> compressed with something like zlib.

For zlib, no; for lzo my very rough measurements show that it may
actually be cheaper.  I will try to write a little testbench and post
it.  But my main argument against it is not so much that crunched
headers are not more byte-efficient, but rather that they are a
premature optimization.

-- 
Martin 

Facts are meaningless. You could use facts to prove anything that's
even remotely true.
	-- Homer Simpson