performance suggestion: sparse files
Martin Pool
mbp at sourcefrog.net
Tue Sep 9 13:48:16 EST 2003
On 26 Aug 2003 jw schultz <jw at pegasys.ws> wrote:
> On Tue, Aug 26, 2003 at 11:28:12AM -0700, Jon Howell wrote:
> > I worked around the problem by adding -z to compress the stream
> > first(blocks of zeros compress remarkably well), and that made the
> > virtual disk image transfer go much faster. Of course, all of the
> > .tgzs and .tbzs in the same transfer got slower waiting on the
> > source CPU to compress the incompressible.
>
> That is what i would have recommended.
>
> > The obvious solution is to <music type=organ register=bass>change
> > the protocol</music>, but that seems like a scary thing to do for a
> > performance tweak. What about an option for
> > "really-crappy-compression"? Something really cheezy (RLE) that can
> > decide in a hurry whether to compress away a string of zeros, and if
> > not, just send them raw. That way, performance on compressed files
> > stays I/O bound even on systems with pokey CPUs, but sparse files
> > are disk-bound on the source system (as they should be). (And, of
> > course, --sparse would automatically promote the compression level
> > to "really-crappy" if it was at "none" before.)
>
> This is really only an issue when rsync hits a new file. I
> agree an RLE of the stream _sounds_ lika a good idea. But
> even better might be an extra phantom block that represents
> all zeros. That too would require a protocol bump.
I'd want to be convinced that this was really enough cheaper than -z1
to justify the complexity.
(For rdiff having cheap encoding of zeros would seem to make sense...)
> There is no way in user-mode to distinguish between a sparse file and
> a file full of zeroed blocks.
That is correct.
Actually you can guess by looking at the allocated-blocks measure, and
use this to guess whether it's preallocated zeros or sparse, which
might be useful for backups. But there is no way around reading the
blocks.
--
Martin
More information about the rsync
mailing list