multiple sessions to same destination

jw schultz jw at pegasys.ws
Thu Oct 10 21:08:00 EST 2002


On Thu, Oct 10, 2002 at 01:09:16PM -0700, Derek Simkowiak wrote:
> > Yeah, you're fine, as long as, as you say, no two sessions are accessing
> > the same objects.  Even then, rsync handles it fairly well...
> 
> 	Can you elaborate on this?
> 
> 	Below is message I sent a while ago, but never got a response on.
> I'd like to know what you mean by "rsync handles it fairly well".  Any
> information you have would be greatly appreciated.
> 
> --
> Date: Thu, 26 Sep 2002 17:24:53 -0700 (PDT)
> Subject: Server writes...?
> 
> 
> 	I have a quick question about rsync's writing of files.
> 
> 	I have a team of people that all use the host BigServer, which is
> running rsync as a deamon, as a central place to keep all shared files
> backed up.  The "master copy" for any given file is considered to be the
> local file that somebody has worked on -- i.e., BigServer is NOT
> considered the master copy.  BigServer is the backup copy.  Team members
> back up their files to BigServer periodically with the rsync client.
> 
> 	Sometimes several team members work on the same file.  In this
> case, the team member who most recently rsync'd their local copy up to
> BigServer has a backup -- anyone else loses their "BigServer backup copy"
> the instant one of the other team members uploads their version.  (The
> semantics are exactly like a team sharing a single Samba share for
> backup.)
> 
> 	The files the team write might be big, as in several dozen or
> hundreds of megs.
> 
> 	Now to my question: What happens if three team members all try to
> write the same huge file to BigServer at "the same time"?  Meaning, the
> rsync daemon on BigServer gets three connections that start uploading
> "/shared_space/bigfile.mov", before any one of the connections has
> finished uploading its complete copy?
> 
> 	Is there any chance that the resulting "/shared_space/bigfile.mov"
> on BigServer would have a corrupted copy, because several clients were
> uploading at the same time?  Or, does the rsync daemon guarantee that the
> last person (the 3rd team member to connect to BigServer) gets to upload
> the final, uncorrupted version of "bigfile.mov" to BigServer?
> 
> 	Any help is greatly appreciated.  I've read all the docs I could
> find but did not see this addressed.

If you have multiple uploads happening at overlapping time
you could indeed get a corrupt file.  There is a window
between the time that rsync generates the block and rolling
checksums and the time it updates the files.  The size of the
window is proportional to the size of the file tree.  If
something changes a portions of a file between the time of
the change calculations calculations and the actual transfer
and those portions are not overwritten by rsync the file
will be corrupted.

To spell it out:

	1. rsync compares file lists and finds potentially
	   changed files.

	2. rsync generates block and rolling checksums and
	   calculates changed portions of files.

	3. rsync transfers changed portions.

Each step is done for the entire tree being synced before
the next step is begun so duration of risk is proportionate
to the size and volatility of the tree.  This is another
drawback to doing the whole tree as a batch instead of a
rolling process or directory at a time.

If a file is changed between 1 and 2 rsync might not know
the file had changed so the tree will be out of sync.

If a change occurs to a file between steps 2 and 3 it may
cause corruption of the file like so:

	rsync [fred] detects that bytes 1232 - 1427 are changed

	something [ethel] (another rsync perhaps) changes bytes 1130 - 1291

	rsync [fred] updates bytes 1232 - 1427

	The file now matches fred except for
	bytes 1130 - 1231 which contain data from ethel
	
This example is a bit simplified but conveys the gist.

If you think your risk is unacceptably high i would suggest
setting max connections = 1.

If two rsyncs are doing the actual file updates (step 3
above) on the same file at the same time the last one to
finish will win.  The update itself is atomic as far as
rsync is concerned.  Each rsync will create a tempfile with
a unique name and write into that.  Once it has a new
version of the file it will rename it to the original.

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw at pegasys.ws

		Remember Cernan and Schmitt



More information about the rsync mailing list