performance with >50GB files

Tue Jan 10 20:47:47 GMT 2006

On Tue, Jan 10, 2006 at 09:02:14PM +0100, Ren? Rebe wrote:
> So far just increasing the block-size significantly (10-20MB) bumps
> the speed by magnitudes into useful regions.

That's good.  For some reason I was thinking that the block size was
nearly maxxed out for a 50GB file, but I can see that the current
algorithm for selecting the block size ends up nowhere near the
maximum block-size limit.

> Is there any plan to properly deal with this [...] ?

I plan to eventually look into making the hash-search more efficient
when dealing with really large block counts, but my current (self-
selected) priorities are to first work on optimizing large numbers of
files before really large files (since that seems to affect more
people).  However, I am available for contract work if someone would
like to invest money into improving a particular area of rsync's
open-source code (which would let me work on rsync more than I can in
just my free time).

Also, feel free to discuss potential solutions on the list and/or work
up a patch code to make this better.  I know the subject has come up on
the list before, and we bandied about a few ideas, but nothing concrete
came out of the discussions -- at least, not yet.  For instance, there
was a suggested improvement in the "Rsyncing really large files" thread
from February/March in 2005, but I never got a patch for the one
suggested improvement (which was to come up with a hashing algorithm
that would try to optimize the hash table based on block count).

..wayne..