Key for high CPU usage

Daniel.Li daniel_li at usish.com
Thu Jun 11 07:07:00 GMT 2009


Hi Leen,

Thanks for your reply.

On Thu, 2009-06-11 at 08:24 +0200, Leen Besselink wrote:
> Leen Besselink wrote:
> > Daniel.Li wrote:
> >> Dear List,
> >>
> >> I'm trying to take a closer look at rsync code, and found when we run
> >> daemon, it will take a lot of CPU (400Mhz). So I'm interested in Which
> >> part of rsync code on ver 3.0.5 consuming CPU a lot?
> >>
> >> Can anyone here help to lighten me up? So I can try to improve the
> >> performance or low the CPU usage.
> >>
> >>
> >> I suspect that there are a few factors, which might related with CPU
> >> usage: rolling checksum/Disk IO(a slide window has been implemented),
> >> read or write?
> >>
> >>
> >> Hope I can find some info here! Thanks in advance! 
> >>
> >>
> > 
> > Hi Daniel,
> > 
> > Not sure how much you know about how rsync works, but maybe you first want
> > to know how the algoritm works ? I'm fairly sure it's a large part of the
> > CPU-usage:
> > 
> > http://www.samba.org/rsync/tech_report/
> > 
> > But I personally enjoyed the talk talk Andrew Tridgell did at OLS in 2000
> > more, here is a transcript:
> > 
> > http://olstrans.sourceforge.net/release/OLS2000-rsync/OLS2000-rsync.html
> > 
> > Here are the slides of the talk:
> > 
> > ftp://ftp.samba.org/pub/tridge/talks/rsync_ols.tgz
> > 
> > I wouldn't be surprised if you were able to find the mp3 online somehere
> > with the filename:
> > 
> > 2000-07-21_15-02-49_C_64.mp3


I'm glad to see the above info. And I'll take a look a little bit later.
Really appreciated.



> > 
> > 
> 
> I was checking the talk and did find this bit:
> 
> "in fact the bottleneck, when people use the -z option, 90% of the CPU is
> in gzip, you know, the zlib library."
> 
> So if you enabled compression, then you probably know where your CPU-time went.


Yes, indeed. Well, here, I think there are two questions:

a) CPU usage of rsync GPL code:

As you said, -z option is a factor. And I disable the "z" option, but it
still use a lot of CPU, around 87% on my 400MHz arm-CPU. So I think it
has something to do with algorithm (and hardware).

I hope I can have some clue to lower the CPU usage, you know, maybe
there is way to optimize the code. Well, I don't know the code very
well. I think 3.+ version has been improved a lot from 2.6.9. But I'm
wondering if we could optimize it further? 

b) diff-code contributed 10% or more CPU usage.

I just finished diff-module based on rsync GPL code. It can save/restore
diff data. But it DOES take a lot of CPU, arise from 87% to 97.7%. I
think these 10% (or more) is contributed by my diff code. In theory, it
should NOT need any extra CPU.

People maintaining the rsync code has more experiences in the field. And
I think they should have met this before. Currently, I don't know what
factor will be the root cause for this contribution. 

I suspect that "slide window for reading data" might be the root cause."

My current procedure in checksum caculation is:
allocate buffer --> read data --> checksum --> release buffer. 

Rsync code: 
 create slide window (first time or slide windows too small) --> feed
back data (check if necessary to read data) --> checksum

I have to test current code and then verify if it's the root cause. 

     A. I also hope there is advice/suggestion on factors of CPU usage.
        Any advice/suggestion is appreciated.

> 
-- 
Daniel Li




More information about the rsync mailing list