Need hint for my question regarding the working of rsync.
Karl O. Pinc
kop at meme.com
Wed Nov 13 11:59:08 MST 2013
On 11/13/2013 12:03:21 PM, Kevin Korb wrote:
> OK, in the case of using v3 with --link-dest and not --checksum most
> of the initial activity on the sender would be doing calls to stat()
> to index what is there.
>
> The receiving side would be doing 2x the stat() calls (you have 2
> --link-dest dirs for it to check) and link() calls every time it
> finds
> a matching file.
Am I correct in my impression that the sender and receiver
are doing the above serially, not concurrently?
> stat() is an expensive call in terms of time spent (especially when
> multiplied by millions of files) but it doesn't really translate into
> much disk IO since it is such a small amount of actual data. The
> link() call is pretty much the same except it is a write op instead
> of
> a read op. So, you wouldn't show much MB/sec usage of your disks
> until rsync found a new or different file but there would be many
> small operations.
My thought is to save wall time by increasing concurrency.
No doubt there are tradeoffs involved. If forced to
choose between features what I really want is entirely
different; for -H to have "priority" over
--link-dest so that when the fs surpasses its hardlink
limit the end result is that the -H links exist
and the --link-dest links do not. Future --link-dest
operations would then work and, most importantly,
the result of the running rsync operation would
be a good copy of the source. This would allow
many --link-dest-ed backups of a fs used by
hardlink-happy applications. (Like yum.)
Karl <kop at meme.com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein
More information about the rsync
mailing list