cut-off time for rsync ?

Thu Jul 2 07:47:55 MDT 2015

On Wed, Jul 01, 2015 at 02:05:50PM +0100, Simon Hobson said:

  >As I read this, the default is to look at the file size/timestamp and if
  they match then do nothing as they are assumed to be identical. So unless
  you have specified this, then files which have already been copied should be
  ignored - the check should be quite low in CPU, at least compared to the
  "cost" of generating a file checksum etc.

This belies the issue of many rsync users not sufficiently abusing rsync to do
backups like us idiots do! :) You have NO IDEA how long it takes to scan 100M files
on a 7200 rpm disk. It becomes the dominant issue - CPU isnt the issue at all.
(Additionally, I would think that metadata scanning could max out only 2 cores
anyway - 1 for rsync's userland gobbling of another core of kernel running the
fs scanning inodes).

This is why throwing away all that metadata seems silly. Keeping detailed logs
and parsing them before copy would be good, but requires an external selection
script before rsync starts, the script handing rsync a list of files to copy
directly. Unfortunate because rsync's scan method is quite advanced, but doesnt
avoid this pitfall.

Additionally, I dont know if linux (or freebsd or any unix) can be told to cache
metadata more aggressively than data - not much point for the latter on a backup
server. The former would be great. I dont know how big metadata is in ram either
for typical OS's, per inode.

/kc
-- 
Ken Chase - ken at heavycomputing.ca skype:kenchase23 +1 416 897 6284 Toronto Canada
Heavy Computing - Clued bandwidth, colocation and managed linux VPS @151 Front St. W.