File rename detection?
jw at pegasys.ws
Thu Jan 29 01:12:04 GMT 2004
On Thu, Jan 29, 2004 at 11:42:12AM +1100, Donovan Baarda wrote:
> On Thu, 2004-01-29 at 06:27, jw schultz wrote:
> > On Wed, Jan 28, 2004 at 09:06:52PM +0300, ??????? ???????? wrote:
> > > Hello!
> > >
> > > As I was found rsync do not detect file renaming. If I just copy my
> > > backup.0.tgz (many Mbytes in size having it's md5) to backup.1.tgz
> > > (which will be equial in size and md5) it will be the same file
> > > in fact...
> > >
> > > Rsync will delete old file (backup.0.tgz) on the receiving side and
> > > download new one (backup.1.tgz). I do not think there are any
> > > difficulties to detect this situation and follow the natural way:
> > > just rename the file on the receiving side.
> > >
> > > Am I right?
> > Reliably detecting renamed files in a manner that rsync can
> > act on it is very difficult.
> I believe rsync currently sends whole-file md4sums along with the
> signature. At least when using the -c option, it _could_ use the md4sums
> to identify identical but moved files.
> Actually implementing it is another story :-)
Whole file checksums are sent for all files only if -c is
specified, slowing down the whole ball of wax. That is the
purpose of -c (having checksums for file comparison).
The key would be to change completely the delete code which
currently runs either before or after the update stuff.
Instead, before updates are done loops would have to be run
correlate deleted and new files.
Rename detection wouldn't necessarily require checksums. If
a new file had the same timestamp, ownership and size as a
deleted one (even without checksums) it would most likely be
the same file. Such correlating files should be used as the
basis file for an update not just a straight rename. That
way a false positive would result in transferring the whole
file as it is done now. I'm not sure i'd trust just the
whole-file Adler32 and MD4 checksums to be birthday safe for
a blind rename.
Frankly, for the small payoff i'm not that eager to see this
done near term.
In most cases it is reasonable to adjust file naming schemes
to use less ephemeral names thereby avoiding the problem
altogether. For now, where that isn't possible and the file
sizes make rsync's current behaviour too expensive i'd
suggest looking at a pre-rsync pass with another tool to
identify renamed files.
J.W. Schultz Pegasystems Technologies
email address: jw at pegasys.ws
Remember Cernan and Schmitt
More information about the rsync