Need hint for my question regarding the working of rsync.

Kevin Korb kmk at sanitarium.net
Wed Nov 13 11:03:21 MST 2013


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

OK, in the case of using v3 with --link-dest and not --checksum most
of the initial activity on the sender would be doing calls to stat()
to index what is there.

The receiving side would be doing 2x the stat() calls (you have 2
- --link-dest dirs for it to check) and link() calls every time it finds
a matching file.

There wouldn't be any checksumming or hashing or serious data transfer
until the incremental indexing stats something that is either not in
the receiver's link-dests or is different in them.  Then rsync would
either send the whole file and a checksum of it or it would begin
hashing both versions, transferring those hashes to find out what is
different, then sending the different pieces and a whole file checksum
so the receiver can make sure it put the file together correctly.

stat() is an expensive call in terms of time spent (especially when
multiplied by millions of files) but it doesn't really translate into
much disk IO since it is such a small amount of actual data.  The
link() call is pretty much the same except it is a write op instead of
a read op.  So, you wouldn't show much MB/sec usage of your disks
until rsync found a new or different file but there would be many
small operations.


On 11/13/13 11:22, Karl O. Pinc wrote:
> On 11/12/2013 04:25:53 PM, Kevin Korb wrote:
>> First, are you talking about --checksum checksums or the hashing
>> of files that are different on both ends so that only the
>> differences need to be transferred?  You seem to be talking about
>> the latter while describing the performance of the former.
> 
> Attached are the output on the rsync source and destnation systems
> of commands like:
> 
> vmstat 1 | awk '{print "destsys " strftime("%H:%M:%S") " " $0;}' >
> / tmp/destlog
> 
> and
> 
> printf '' > /tmp/destrlog ; while sleep 1 ; do ps axwww --forest |
> grep rsync | awk '{print "destsys " strftime("%H:%M:%S") " " $0;}'
> >> /tmp/ destrlog ; done
> 
> ---
> 
> At 9:00:06 the rsync command starts on the system to which the data
> is being transferred (destrlog):
> 
> rsync --rsh=ssh -4l root -i /etc/rsync.d/slate_pull_key --ipv4
> --one- file-system --archive --hard-links --quiet --numeric-ids
> --sparse -- link-dest /srv/backups/janus/2013-11-13-02-08/
> --link-dest /srv/ backups/janus/2013-11-13-00-24/
> root at janus.meme.com::pull-backup// / 
> srv/backups/janus/2013-11-13-09-00/
> 
> At 9:00:07 this can be seen on the source side (invoked via an ssh
> authorized_keys file)  (srcrlog):
> 
> rsync --server --daemon .
> 
> From 9:00:06 to 9:00:42 the source side is reading the disk. At
> 9:00:42 the destination side begins to read the disk.
> 
> This is not a great example since the source side is doing other
> processes and the destination side is much faster than the source,
> but usually what I'll see at this point is that the source side
> becomes more or less idle and waits for the destination side to
> finish reading.  In this case the destination side does this in
> about 2 seconds so it's hard to see.  You can see on the source
> side that the disk io has dropped and the cpu is spending less
> time waiting for disk.
> 
> The whole thing is a lot more apparent when either the filesystems
> are very large or the systems very slow. In these cases it would
> save wall time if the source and destination did what appears to be
> "initial reading" in parallel.
> 
> 
> At about 9:00:46 rsync I start to get steady activity on both the
> source and destination sides.  Most of the --link-dest directory
> content is already on the dest side so there's not a lot for the
> dest side to do.
> 
> At about 9:01:49 the rsync command finishes (destrlog).
> 
> ---
> 
> The awk is gawk.  The systems are debian 7.  The rsync is:
> 
> $ rsync --version rsync  version 3.0.9  protocol version 30 
> Copyright (C) 1996-2011 by Andrew Tridgell, Wayne Davison, and
> others. Web site: http://rsync.samba.org/ Capabilities: 64-bit
> files, 64-bit inums, 32-bit timestamps, 64-bit long ints, 
> socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace, 
> append, ACLs, xattrs, iconv, symtimes
> 
> 
> Karl <kop at meme.com> Free Software:  "You don't pay back, you pay
> forward." -- Robert A. Heinlein
> 

- -- 
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
	Kevin Korb			Phone:    (407) 252-6853
	Systems Administrator		Internet:
	FutureQuest, Inc.		Kevin at FutureQuest.net  (work)
	Orlando, Florida		kmk at sanitarium.net (personal)
	Web page:			http://www.sanitarium.net/
	PGP public key available on web site.
~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlKDvukACgkQVKC1jlbQAQfXDwCfeNGS32N3I/D8zHbHUcNdcgbG
ZOIAnRTtCH0cDQx9ftAwlpdP3JrpKe8d
=bBvW
-----END PGP SIGNATURE-----


More information about the rsync mailing list