Need hint for my question regarding the working of rsync.

Karl O. Pinc kop at meme.com
Wed Nov 13 11:59:08 MST 2013


On 11/13/2013 12:03:21 PM, Kevin Korb wrote:
> OK, in the case of using v3 with --link-dest and not --checksum most
> of the initial activity on the sender would be doing calls to stat()
> to index what is there.
> 
> The receiving side would be doing 2x the stat() calls (you have 2
> --link-dest dirs for it to check) and link() calls every time it 
> finds
> a matching file.

Am I correct in my impression that the sender and receiver
are doing the above serially, not concurrently?

> stat() is an expensive call in terms of time spent (especially when
> multiplied by millions of files) but it doesn't really translate into
> much disk IO since it is such a small amount of actual data.  The
> link() call is pretty much the same except it is a write op instead 
> of
> a read op.  So, you wouldn't show much MB/sec usage of your disks
> until rsync found a new or different file but there would be many
> small operations.

My thought is to save wall time by increasing concurrency.

No doubt there are tradeoffs involved.  If forced to
choose between features what I really want is entirely 
different; for -H to have "priority" over
--link-dest so that when the fs surpasses its hardlink
limit the end result is that the -H links exist
and the --link-dest links do not.  Future --link-dest
operations would then work and, most importantly,
the result of the running rsync operation would
be a good copy of the source.  This would allow
many --link-dest-ed backups of a fs used by 
hardlink-happy applications.  (Like yum.)


Karl <kop at meme.com>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein


More information about the rsync mailing list