Slowness and sparse files

Wed Apr 29 09:58:21 GMT 2009

Hi - I'm trying to diagnose a slow rsync problem. I'm trying to rsync a 
lot of fresh data between two systems with gigabit ethernet, but using 
ssh. The systems have large RAID disks, with pretty fast read and write 
benchmarks (>100MB/s). Both systems are running Fedora 10 Linux. I only 
get around 15MB/s transferred between the two systems using:

rsync -raxSH --numeric-ids /indir sys2:/outdir

I've tried switching to rsh, but that doesn't help a great deal. I get 
close to maximum gigabit speeds in simple data copy tests however.

Running strace on the rsync running on the destination system, I see it 
does a lot of seeking between writes. It seems that the sparse file 
support does this for even tiny numbers of zero bytes. Wouldn't it make 
sense to have a minimum threshold of ~1024 sparse bytes before doing a 
seek? I would suspect seeking increases the overheads quite a bit.

Can file systems actually record such small numbers of sparse bytes? I 
would assume that they work on the block basis for sparse files (at least 
on ext2 etc), so it's not clear to me why rsync has SPARSE_WRITE_SIZE set 
to 1024 (rather than 4096), and why there isn't a minimum threshold of 
~1024 bytes before seeking.

It's not clear to me that this is my problem, but at least it would be a 
sensible optimisation.

Jeremy

-- 
Jeremy Sanders <jeremy at jeremysanders.net>
http://www.jeremysanders.net/                Cambridge, UK
Public Key Server PGP Key ID: E1AAE053