superlifter design notes and a new proposal
jw at pegasys.ws
Mon Aug 5 15:31:02 EST 2002
On Mon, Aug 05, 2002 at 01:01:22PM -0700, Wayne Davison wrote:
> On Mon, 5 Aug 2002, Martin Pool wrote:
> > - Using lzo or gzip will squish them to a similar size, so
> > micro-optimization is not necessary.
> I'm somewhat cautious about adding a compression layer over the top of
> the message headers. It should be doable, but it will make things a
> bit more complex. For instance, my current code reads one message at a
> time because it understands the header format and knows how long the
> data will be from that understanding. A zlib deflate/inflate that
> included the message headers themselves will mean that we need to slurp
> as much data as is available on the socket, decompress as much of that
> as possible (which might not be all the currently-read data), and then
> operate on as many messages as we find in the decompressed data. We
> also have to be sure that we flush the data on the compression side
> often enough that we avoid deadlocking waiting for new message headers
> that aren't coming (i.e. zlib adds a buffering layer of its own). If
> we make it flush once per message, that would probably be very byte-
> inefficient for smaller messages. If we instead only force a flush
> when a pause in the action occurs, that would probably be optimal, but
> might add delays to the transfer and would add overall complexity.
Perhaps a flush at each chdir?
For compresed data I would want at least some header
uncompressed. At a minimum:
u16 data_len; /* uncompressed length */
u16 xmit_len; /* compressed length */
so you could manage memory and read() operations.
This could be all hidden behind (get|send)_msg() so the
callers aren't affected. If so the small message
ineficiencies might be mitigated by clumping them together.
This would have to be subject to flushing and getting
rid of the latency dependent deadlocks.
> > - Packing/unpacking these things will use much less CPU
> I'm not sure that this will be true if the headers are then compressed
> with something like zlib.
> So, I think the tradeoffs between the two schemes are closer than you
> might think. My variable-width message header scheme is essentially a
> very specific compression scheme placed over the top of the arg values
> that come in to the send_msg() function. I think that using your idea
> of fixed-width header fields and a zlib (or whatever) compression over
> the top would be able to surpass my current implementation in byte
> efficiency over the wire, but will be slightly more complex to implement
> (due to the extra buffering layer and the blind socket reads).
> In the future, we should be able to code up a replacement for the
> read_msg()/write_msg() functions and just plug it into the code and try
> it out (and also have a nice test suite that exercised these functions
> and checked the resulting values). In the present my rZync code saves a
> buffer copy by exploiting knowledge of how the send_msg() function
> works, so I'd have to change the sig/delta code first before we could
> try out a replacement (I know, I know; I'm an optimizing-fool sometimes).
> To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
J.W. Schultz Pegasystems Technologies
email address: jw at pegasys.ws
Remember Cernan and Schmitt
More information about the rsync