superlifter design notes and a new proposal
jw schultz
jw at pegasys.ws
Mon Aug 5 21:20:02 EST 2002
On Tue, Aug 06, 2002 at 01:58:17PM +1000, Martin Pool wrote:
> On 5 Aug 2002, jw schultz <jw at pegasys.ws> wrote:
> > On Tue, Aug 06, 2002 at 01:09:46PM +1000, Martin Pool wrote:
> > > On 5 Aug 2002, Paul Haas <paulh at hamjudo.com> wrote:
> > >
> > > > > Fast networks are a significant use case that ought to be considered.
> > > > > On GigE, having low processing overhead and being able to use things
> > > > > like sendfile() to transmit data are going to be enormously more
> > > > > important than the length of the header.
> >
> > I concur. GigE is a while away for most transfers (except
> > the TB+) but i think we can say that the vast majority of
> > rsyncs probably fall into either the 100Mb switched or the
> > 384Kb-1.4Mb internet links.
>
> Yes. We need to lead the target a little bit though: GigE is already
> something I could afford to run at home; in a few years it will
> probably be commonplace at least in the machine room.
My main point here is that we either run over a fast link
where compression isn't buying us that much or a slow link
where it buys a lot. I think at 100Mb it is a bit iffy
depending on content vs. CPU speed. I find that current
rsync with no compression on a 300MHZ CPU saturates the CPU
not the wire.
> > > > Is there a big real world CPU penalty for letting ssh do the
> > > > compression?
> > >
> > > I doubt it.
>
> To be more clear, I wasn't saying that it would be more efficient to
> do compression in SSH; it won't be. However, if you have SSH set up
> to always compress (as I do), then it is not much worse to do an
> uncompressed rsync across that.
>
> Picture if you will a multi-dimensional "solution space", with
> dimensions including file size, number of files, network bandwidth,
> latency, etc.
>
> Different solutions are good for different areas: perhaps LZO is
> worthwhile in most places, but bzip2 is better for very slow links and
> fast processors. At the moment there are some regions, such as very
> large trees, that are quite impractical.
I consider the compression lib to be secondary until the
rest of the protocols are worked out and running so i don't
want to get into choosing now. Lets leave it that the
compression libs should be pluggable or added later. I
concur that the best compression lib will depend on the
relative CPU:network:disk speeds.
To be self-contradictory, in my present detail ignorance i
doubt bzip2 is a good option if latency threatens deadlocks.
The utility manpage indicates it wants rather large buffers
(100KB+).
> There are some areas which we don't know much about yet, but it might
> turn out that on 1Gbps+ links, we actually need to send packets much
> larger than 32kB to be efficient. Probably something like sendfile()
> is useful there too, so I don't want to close it off.
>
> It's very important to me that we don't focus so much on tuning for
> one area (e.g. low bandwidth links) that we unnecessarily close off
> others. We can't know ahead of time all the constraints and wrinkles
> that we might encounter, which is why I am arguing for taking
> advantage of experiences with previous protocols, being fairly clean,
> and allowing room for growth.
Yes. And lets defer some specifics until we have something
to apply them to.
--
________________________________________________________________
J.W. Schultz Pegasystems Technologies
email address: jw at pegasys.ws
Remember Cernan and Schmitt
More information about the rsync
mailing list