[PATCH] --omit-dir-changes, qsort<>mergesort issues

Antti Tapaninen aet at cc.hut.fi
Thu Jun 8 10:48:41 GMT 2006


On Wed, 7 Jun 2006, Matt McCutchen wrote:

> If, as I suspect, I am completely missing the point, please explain your
> problem in such a way that I have a hope of understanding!

Heh, sorry. Here's a new description of the process that hopefully makes 
my goals more clear.

The index file that the tool maintains between sync's is just a textfile 
that has three fields, the path/filename, mtime and the type of 
path/filename (or MD5 sum, symlink dest).

/alt/root is the only directory where files get sync'ed between central 
file server and client communication, either pushed or pulled.

/alt/local is an optional directory, anything on there overrides the 
priority of files that might be on /alt/root.

/alt/backup/YYYYMMDD.HHMM hierarchy is primarily intended to save any 
pre-existing vendor files on system, that we replace when 
/alt/{local,root} gets sync'ed against the real root directory of a host.

The actual 4 stage sync process:

- Sync /alt/{local,root} against the real root directory,
   backup replaced files to /alt/backup.

   But because I *don't* want to backup any changed files that
   previously came from /alt/{local,root}, I do a 3-way diff
   against real root directory contents, old index and the
   current status at /alt/{local,root}. Based on diff, the tool
   generates an exclude file that prevents backing up our own
   files that have recently changed "legally".

   Because of this, usually the only files that get backed up are the
   ones that have been overwritten by some recent OS package upgrade
   operation. Occasionally some configuration files gets backed up
   because an administrator has touched the file *directly* without
   using the /alt hierarchy. For example, made a temporary one-liner
   change to /etc/hosts.allow. If the change was that important, the
   admin can probably dig it up from /alt/backup.

- Like above, but sync without generated exclude file, just to
   to make sure some less important changes to files get there.

- Let the tool itself handle unlinking of files that we no longer
   distribute, similarly based on index diff like at stage #1.

- See if any /alt/backup directories exists, sort them on reverse
   date order. Based on index diffs, exclude all files in index except
   the ones that got removed at stage #3. Sync backup directories against
   the real root directory. Use --remove-sent-files to make sure
   that any restored vendor file isn't at backup directory after
   sync, it only removes the latest version, not all of them.

Overall, the time spend on these 4 stages is usually a matter of 
second(s). Between master<>client and in local sync operations, the 
mergesorting rsync really helps to achieve this task easily without doing 
much of the work outside of rsync.

Perhaps there could be some -M option to alter the sort algorithm used?
So far, I've used the mergesort() function from FreeBSD and it works 
great.. but rsync would probably need a GPL'd version instead?

Cheers,
-Antti


More information about the rsync mailing list