performance suggestion: sparse files
mbp at sourcefrog.net
Tue Sep 9 13:48:16 EST 2003
On 26 Aug 2003 jw schultz <jw at pegasys.ws> wrote:
> On Tue, Aug 26, 2003 at 11:28:12AM -0700, Jon Howell wrote:
> > I worked around the problem by adding -z to compress the stream
> > first(blocks of zeros compress remarkably well), and that made the
> > virtual disk image transfer go much faster. Of course, all of the
> > .tgzs and .tbzs in the same transfer got slower waiting on the
> > source CPU to compress the incompressible.
> That is what i would have recommended.
> > The obvious solution is to <music type=organ register=bass>change
> > the protocol</music>, but that seems like a scary thing to do for a
> > performance tweak. What about an option for
> > "really-crappy-compression"? Something really cheezy (RLE) that can
> > decide in a hurry whether to compress away a string of zeros, and if
> > not, just send them raw. That way, performance on compressed files
> > stays I/O bound even on systems with pokey CPUs, but sparse files
> > are disk-bound on the source system (as they should be). (And, of
> > course, --sparse would automatically promote the compression level
> > to "really-crappy" if it was at "none" before.)
> This is really only an issue when rsync hits a new file. I
> agree an RLE of the stream _sounds_ lika a good idea. But
> even better might be an extra phantom block that represents
> all zeros. That too would require a protocol bump.
I'd want to be convinced that this was really enough cheaper than -z1
to justify the complexity.
(For rdiff having cheap encoding of zeros would seem to make sense...)
> There is no way in user-mode to distinguish between a sparse file and
> a file full of zeroed blocks.
That is correct.
Actually you can guess by looking at the allocated-blocks measure, and
use this to guess whether it's preallocated zeros or sparse, which
might be useful for backups. But there is no way around reading the
More information about the rsync