[Bug 1463] New: poor performance with large block size

Chris Shoemaker c.shoemaker at cox.net
Wed Jul 14 22:27:45 GMT 2004


On Thu, Jun 17, 2004 at 08:47:57PM -0700, Craig Barratt wrote:
> 
> > But, the comment seems to have been right on. I have re-run the
> > experiment with block sizes as small as 3000 (yes it took a long
> > time to complete) all the way up to block sizes of 100000 with it
> > working in reasonable times. But, when the block size approaches
> > 170,000 or so, the performance degrades exponentially.
> >
> > I understand that I am testing at the very fringes of what we should
> > expect rsync to do. File sizes of 25Gig and 55Gig are beyond what was
> > originally envisioned (based on 64k hash buckets and a sliding window
> > of 256k).
> 
> Here's a patch to try.  It basically ensures that the window is
> at least 16 times the block size.  Before I'd endorse this patch
> for CVS we need to make sure there aren't cases where map_ptr is
> called with a much bigger length, making the 16x a bit excessive.
> 
> Perhaps I would be tempted to repeat the previous check that the
> window start plus the window size doesn't exceed the file length,
> although it must be at least offset + len - window_start as in
> the original code.

hehe, I'm catching up to you guys.  I'm kinda late.  That's what I get
for letting my email back up for >1 month. :)

> 
> In any case, I'd be curious if this fixes the problem.
> 
> Craig
> 
> --- rsync-2.6.2/fileio.c        Sun Jan  4 19:57:15 2004
> +++ ../rsync-2.6.2/fileio.c     Thu Jun 17 19:33:26 2004
> @@ -193,8 +193,8 @@
>         if (window_start + window_size > map->file_size) {
>                 window_size = map->file_size - window_start;
>         }
> -       if (offset + len > window_start + window_size) {
> -               window_size = (offset+len) - window_start;
> +       if (offset + 16 * len > window_start + window_size) {
> +               window_size = (offset + 16 * len) - window_start;
>         }

My initial reaction (having not actually read the code) is that it would
be desirable make the window_size highly composite, and then ensure that
the block size is an integer factor of the window_size.  In other words,
avoid the memmoves altogether.

{/me thinks a bit}

Actually, I don't think this removes the pathological case at all.  It just
reduces the frequency of the impact by a factor of 16.  Consider when
len = window_size/16 - 1.  We'll still end up with a memmove of size
len-16, which is expensive when len is large, despite window_size being
16 times larger.

Besides making window_size mod len = 0, another solution could be to use
a circular buffer.

This is a pretty nasty bug you guys found.  I hope we can fix it soon.

-chris

>  
>         /* make sure we have allocated enough memory for the window */
> -- 
> To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


More information about the rsync mailing list