superlifter design notes and a new proposal

Martin Pool mbp at samba.org
Mon Aug 5 21:03:02 EST 2002


On  5 Aug 2002, jw schultz <jw at pegasys.ws> wrote:
> On Tue, Aug 06, 2002 at 01:09:46PM +1000, Martin Pool wrote:
> > On  5 Aug 2002, Paul Haas <paulh at hamjudo.com> wrote:
> > 
> > > > Fast networks are a significant use case that ought to be considered.
> > > > On GigE, having low processing overhead and being able to use things
> > > > like sendfile() to transmit data are going to be enormously more
> > > > important than the length of the header.
> 
> I concur.  GigE is a while away for most transfers (except
> the TB+) but i think we can say that the vast majority of
> rsyncs probably fall into either the 100Mb switched or the
> 384Kb-1.4Mb internet links.

Yes.  We need to lead the target a little bit though: GigE is already
something I could afford to run at home; in a few years it will
probably be commonplace at least in the machine room.

> > > Is there a big real world CPU penalty for letting ssh do the
> > > compression?
> > 
> > I doubt it.

To be more clear, I wasn't saying that it would be more efficient to
do compression in SSH; it won't be.  However, if you have SSH set up
to always compress (as I do), then it is not much worse to do an
uncompressed rsync across that.

Picture if you will a multi-dimensional "solution space", with
dimensions including file size, number of files, network bandwidth,
latency, etc.

Different solutions are good for different areas: perhaps LZO is
worthwhile in most places, but bzip2 is better for very slow links and
fast processors.  At the moment there are some regions, such as very
large trees, that are quite impractical.

There are some areas which we don't know much about yet, but it might
turn out that on 1Gbps+ links, we actually need to send packets much
larger than 32kB to be efficient.  Probably something like sendfile()
is useful there too, so I don't want to close it off.

It's very important to me that we don't focus so much on tuning for
one area (e.g. low bandwidth links) that we unnecessarily close off
others.  We can't know ahead of time all the constraints and wrinkles
that we might encounter, which is why I am arguing for taking
advantage of experiences with previous protocols, being fairly clean,
and allowing room for growth.

-- 
Martin 




More information about the rsync mailing list