Rsync help needed...

Matt McCutchen hashproduct at verizon.net
Sat Feb 25 00:56:37 GMT 2006


On Fri, 2006-02-24 at 18:40 -0500, Linus Hicks wrote:
> I did something similar to what lsk is doing a few months back, I believe using 
> rsync 2.6.5. I wrote a script to query the database for all the datafiles and 
> rsync'ed them individually by specifying the full path to the file. What I found 
> was that if I didn't use --no-whole-file, it did operate in whole-file mode. I 
> was not doing local transfers, so is there some other condition that causes it 
> to default to whole-file mode?

Not that I know of.  But according to the OLDNEWS file in the
distribution, a bug causing whole-file mode to be the default even for
remote transfers was fixed between 2.5.4 and 2.5.5.  Is it possible that
the rsync on one or both ends was 2.5.4 or older?

(For reference: rsync considers a transfer between two paths in a
computer's filesystem local even if NFS or a similar network filesystem
implements one or both ends.  This makes sense because limiting "disk"
I/O (really network filesystem I/O) is more important than limiting
network I/O (the fast loopback interface).)

> The issue of not using --inplace and atomically moving it over the original is 
> complicated by using --temp-dir. lsk has not raised the issue of not having 
> enough room for a second copy of any of his datafiles, so he probably isn't 
> using --temp-dir. However, the statement you made earlier in this thread (quoted 
> below) needs to be extended to account for the case where a --temp-dir resides 
> on a different partition:
> 
> "Not exactly: if --inplace is not used, rsync will write a temporary file
> and atomically move it over the original.  --inplace uses less disk
> space but does not provide atomicity and, according to the man page,
> reduces the efficiency of the incremental transfer algorithm."

The behavior of rsync with a temp dir on a different partition changed
in 2.6.7.  See this request for enhancement:
	https://bugzilla.samba.org/show_bug.cgi?id=3461
The man page of CVS rsync 2.6.7 now has a detailed discussion of the
issue.  You can read the man page here:
	http://cvs.samba.org/cgi-bin/cvsweb/rsync/rsync.yo

> And a performance question: would it be faster to pass the complete list of 
> datafiles to rsync in one fell swoop, for instance using --files-from rather 
> than running rsync individually on each one?

It would be somewhat faster to pass the entire list because you incur
the overhead of setting up the rsync process triangle once, not for
every file.  Furthermore, the rsync protocol is pipelined.  If you have
a network with high bandwidth but considerable latency, calling rsync
once will take advantage of the pipelining while calling it for each
file will wait for several network round trips per file.
-- 
Matt McCutchen
hashproduct at verizon.net
http://hashproduct.metaesthetics.net/



More information about the rsync mailing list