superlifter design notes and a new proposal

jw schultz jw at pegasys.ws
Mon Aug 5 21:20:02 EST 2002


On Tue, Aug 06, 2002 at 01:58:17PM +1000, Martin Pool wrote:
> On  5 Aug 2002, jw schultz <jw at pegasys.ws> wrote:
> > On Tue, Aug 06, 2002 at 01:09:46PM +1000, Martin Pool wrote:
> > > On  5 Aug 2002, Paul Haas <paulh at hamjudo.com> wrote:
> > > 
> > > > > Fast networks are a significant use case that ought to be considered.
> > > > > On GigE, having low processing overhead and being able to use things
> > > > > like sendfile() to transmit data are going to be enormously more
> > > > > important than the length of the header.
> > 
> > I concur.  GigE is a while away for most transfers (except
> > the TB+) but i think we can say that the vast majority of
> > rsyncs probably fall into either the 100Mb switched or the
> > 384Kb-1.4Mb internet links.
> 
> Yes.  We need to lead the target a little bit though: GigE is already
> something I could afford to run at home; in a few years it will
> probably be commonplace at least in the machine room.

My main point here is that we either run over a fast link
where compression isn't buying us that much or a slow link
where it buys a lot.  I think at 100Mb it is a bit iffy
depending on content vs. CPU speed.  I find that current
rsync with no compression on a 300MHZ CPU saturates the CPU
not the wire.


> > > > Is there a big real world CPU penalty for letting ssh do the
> > > > compression?
> > > 
> > > I doubt it.
> 
> To be more clear, I wasn't saying that it would be more efficient to
> do compression in SSH; it won't be.  However, if you have SSH set up
> to always compress (as I do), then it is not much worse to do an
> uncompressed rsync across that.
> 
> Picture if you will a multi-dimensional "solution space", with
> dimensions including file size, number of files, network bandwidth,
> latency, etc.
> 
> Different solutions are good for different areas: perhaps LZO is
> worthwhile in most places, but bzip2 is better for very slow links and
> fast processors.  At the moment there are some regions, such as very
> large trees, that are quite impractical.

I consider the compression lib to be secondary until the
rest of the protocols are worked out and running so i don't
want to get into choosing now.  Lets leave it that the
compression libs should be pluggable or added later.  I
concur that the best compression lib will depend on the
relative CPU:network:disk speeds.

To be self-contradictory, in my present detail ignorance i
doubt bzip2 is a good option if latency threatens deadlocks.
The utility manpage indicates it wants rather large buffers
(100KB+).

> There are some areas which we don't know much about yet, but it might
> turn out that on 1Gbps+ links, we actually need to send packets much
> larger than 32kB to be efficient.  Probably something like sendfile()
> is useful there too, so I don't want to close it off.
> 
> It's very important to me that we don't focus so much on tuning for
> one area (e.g. low bandwidth links) that we unnecessarily close off
> others.  We can't know ahead of time all the constraints and wrinkles
> that we might encounter, which is why I am arguing for taking
> advantage of experiences with previous protocols, being fairly clean,
> and allowing room for growth.

Yes.  And lets defer some specifics until we have something
to apply them to.

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw at pegasys.ws

		Remember Cernan and Schmitt




More information about the rsync mailing list