Future RSYNC enhancement/improvement suggestions

David Bolen db3l at fitlinxx.com
Mon Apr 22 16:28:01 EST 2002


(I wrote about long files using 20-30 min to checksum without network
traffic)

Jason Haar [Jason.Haar at trimble.co.nz] writes:

> ...But then you should have a dialup timeout of 1 hour set?

Oh of course - I was more responding to Martin's comment about there
being enough traffic present in general during an rsync session, since
there are cases when you can have lengthy periods without traffic at
all.

I could also see some NAT boxes holding a particular stream for far
less than an hour by default, but I don't have a particular data point
for that so perhaps it's just being too conservative.

> I think the problem is that you're morally upset that rsync spends so
> much time sending no network traffic. Quite understandable ;-)

Not sure about morally, but definitely financially :-)

> What about separating the tree into subtrees and rsyncing them? That
> means you go from:
>
> 1> dialup connection started [quick]
> 2> rsync generates checksums (no network traffic) [slow]
> 3> rsync transmits files 

Perhaps you misunderstood - the checksum generation time that was
taking so long was on a *single* file level.  Rsync had already
exchanged file lists and chosen the files to transfer - it was working
on a single file and generating the block checksums on the receiver
side to send over to the sender side.

(As it turns out the transfers in question were for a single directory
normally comprised of two files - a database file and its transaction
log)

The real rub was that after spending 20+ minutes with an idle line
computing the checksum, it would then take another 30+ minutes to
transmit the checksum information over.  So it was (and likely still
is) a case where sending the data as computed would have been a major
win.  At least for slow connections, the checksum computation is
unlikely to be the bottleneck versus network transmission, so leaving
the network idle is totally wasted time that could be fully reclaimed.

I may still look into that sort of change but just haven't had the
cycles yet with the decrease in our checksum time - although this
particular discussion has sort of started me thinking about it again.
I may review our current logs to see how much time is being wasted.

-- David

/-----------------------------------------------------------------------\
 \               David Bolen            \   E-mail: db3l at fitlinxx.com  /
  |             FitLinxx, Inc.            \  Phone: (203) 708-5192    |
 /  860 Canal Street, Stamford, CT  06902   \  Fax: (203) 316-5150     \
\-----------------------------------------------------------------------/




More information about the rsync mailing list