Future RSYNC enhancement/improvement suggestions

tim.conway at philips.com tim.conway at philips.com
Fri Apr 19 16:42:02 EST 2002


The problem with cached checksums is that unless the filesystem driver 
regenerates them as the filesystem is modified, they're meaningless on a 
live filesystem.  I ran into a similar problem on huge trees on slow NAS, 
and have finally written my own system (does no checksumming, but instead 
acts like rsync -W, if timestamp and size match, we're done), and sends 
everything in chunks, a list of non-directories to unlink, a list of 
directories to rmdir (in depth order, of course), and a gzipped tar, 8Mb 
at a time.

Tim Conway
tim.conway at philips.com
303.682.4917
Philips Semiconductor - Longmont TC
1880 Industrial Circle, Suite D
Longmont, CO 80501
Available via SameTime Connect within Philips, n9hmg on AIM
perl -e 'print pack(nnnnnnnnnnnn, 
19061,29556,8289,28271,29800,25970,8304,25970,27680,26721,25451,25970), 
".\n" '
"There are some who call me.... Tim?"




Jan Rafaj <rafaj at cedric.vabo.cz>
Sent by: rsync-admin at lists.samba.org
04/19/2002 04:23 AM

 
        To:     <rsync at samba.org>
        cc:     <rafaj at cedric.vabo.cz>
(bcc: Tim Conway/LMT/SC/PHILIPS)
        Subject:        Future RSYNC enhancement/improvement suggestions
        Classification: 




Hello,

Recently while working with rsync as the way to mirror large (several
GB) archive on a regular basis, I came across several problems,
and also got the ideas about their possible solutions
- please could you investigate & consider implementing the features,
described below, to future RSYNC releases ?

- when the checksumming (consider very large archive, several GB)
  stage of rsync runs slow (~3 and more minutes), which is the
  case of either slower CPU machines or machines with older HDDs
  that dont have UDMA or have just UDMA33 transfer modes, one can
  often observe that the network connection to the master site
  shuts down and the mirroring fails (in subsequent mirroring
  attempts, when, f.e., the archive is already transferred from about
  90%). The reason why I think this happens, is the fact, that the
  bidirectionally-open connection is just reset by either client
  or server, becouse rsync does not do any transfer while
  the checksumming runs (I might be wrong, but this is what
  I observed), and the tcp connection is reset becouse of stall
  (I dont have clue by what means, becouse I'm no TCP/IP expert,
  but I suspect it might be just TCP/IP).
  How about adding a feature to keep the checksums in a berkeley-style
  database somewhere on the HDD separately, and with subsequent
  mirroring attempts, look to it just for the checksums, so that
  the rsync does not need to do checksumming of whole target
  (already mirrored) file tree ? I think implementing this could
  take some time, but it would certainly improve rsync's responsivenes
  and ease use with slow CPUs & HDDs

- make output of error & status messages from rsync uniformed,
  so that it could be easily parsed by scripts (it is not right
  now - rsync 2.5.5)

- perhaps if the network connection between rsync client and server
  stalls for some reason, implement something like 'tcp keepalive'
  feature ?

I know these are suggestions only; I dont have enough power nor knowledges
to implement them to rsync by myself (but I feel plagued myself with
the problems described), so I'm sending these solution ideas to you
in the hope they will be useful and could be implemented in the future.

Please let me know your opinion about this.

Thanks & regards,

Jan



-- 
To unsubscribe or change options: 
http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html







More information about the rsync mailing list