performance suggestion: sparse files

jw schultz jw at pegasys.ws
Wed Aug 27 15:51:32 EST 2003


On Tue, Aug 26, 2003 at 01:45:49PM -0700, jw schultz wrote:
> On Tue, Aug 26, 2003 at 11:28:12AM -0700, Jon Howell wrote:
> > So I was transferring a 2GB virtual machine disk image image over a slow
> > wireless link. Of course I used --sparse, to keep the image small on the
> > destination end as well as on the source end.
> > 
> > Much to my surprise, I noticed that the transfer took a long time even
> > when it got past the first 0.5GB of actually-populated file. A little
> > sleuthing with strace revealed that the source rsync was dutifully reading
> > block after block of zeros, sending them to ssh, who compressed them and
> > send them across the wire(less), where another rsync got the zero blocks,
> > realized that they were sparse, and just bode its time until it could do
> > one big seek to the next non-sparse block. ("bode its time"? Who writes
> > like that?)
> 
> > The obvious solution is to <music type=organ register=bass>change the
> > protocol</music>, but that seems like a scary thing to do for a
> > performance tweak. What about an option for "really-crappy-compression"?
> > Something really cheezy (RLE) that can decide in a hurry whether to
> > compress away a string of zeros, and if not, just send them raw. That way,
> > performance on compressed files stays I/O bound even on systems with pokey
> > CPUs, but sparse files are disk-bound on the source system (as they should
> > be). (And, of course, --sparse would automatically promote the compression
> > level to "really-crappy" if it was at "none" before.)
> 
> This is really only an issue when rsync hits a new file.  I
> agree an RLE of the stream _sounds_ lika a good idea.  But
> even better might be an extra phantom block that represents
> all zeros.  That too would require a protocol bump.

On reconsideration having a phantom all NUL block would not
require a protocol bump.  If the receiver were to send a
blocksum for it as part of the blocksum array and recognize
the block offset as phantom the sender need know nothing of
it.

The difficulty would be do it for a file that has size
without messing with the partial block at the end.  For a
new file or whole-file where we send a empty blocksum array
currently we wouldn't have the partial block tail problem.

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw at pegasys.ws

		Remember Cernan and Schmitt



More information about the rsync mailing list