Suggestion: rsync and direct IO

David Sisson David.Sisson at av.com
Wed Dec 4 23:40:59 EST 2002


	We have some cases when copying indexes that we'd like rsync to avoid
the system cache when copying a chunk of data from some other machine. 
I am probably going to modify our own copy of rsync to do this, but if
we could write directly to the disk using direct-io (or from for that
matter) we could avoid polluting the operating system's buffer cache
before we're ready to use the new data.  Obviously this feature isn't
tied to Direct-IO, so I'd call the new feature something like
-avoid-system-cache or something.  I will probably make the feature an
optional one because I can imagine having a situation where sometimes
I'd want it to hit the system filecache and sometimes I wouldn't.

	Sample use:  Here at Altavista we copy large indexes around (multiple
gigs of data).   Obviously the new version of the index isn't good until
the whole thing is ready.  The old version of the index is using the
operating system's buffer cache to increase performance.  Thus you don't
want to make the old version of the index stale until you're done with
it.  Also, you don't need the new version of the index in the system
filecache until you're ready to use it.  Here's where avoiding writing
to the system cache when copying the new file to the system is a good
thing.

	Sample non-use:  You've got an FTP server where people can use rsync to
make mirrors of your site.  You expect lots of people to use the mirror
and get the same files (as well as normal users downloading the files)
so you want rsyncd to put the files in the cache for other people
accessing your mirror.  (Although if you think about it, the
infrequently used files are going to go through the cache as well as the
frequent ones and you'd want to use the cache for the ones already in
memory and not for the infrequent ones -- bleah).



	On another topic while I'm thinking about it -- does any one know of a
reason that you can't set socket options in rsync like you can in rsyncd
-- if you have a push going in the opposite direction to the server it'd
be nice to have a larger outgoing buffer size (useful for cross country
pushes).  We have a local version that has been modified to set the
buffer size and it performs 3 times faster than the unmodified version.





More information about the rsync mailing list