Renamed files and directories

Thu Feb 26 09:48:09 GMT 2009

N.J. van der Horn (Nico) wrote:
> The highest speed and efficiency is to only observe time and size as
> then just a stat-call is needed.  But in more complex situations you
> have to take also the checksum, inode-number, etc into account.  In
> previous posts there were many ideas to cope with this. As rsync is
> state-less regarding the filesystem, it needs to be extended by a DB
> to hold the previous state of the observed filesystem.  The DB can
> provide quickly files on many aspects, eg find a file by checksum or
> other characteristic that is not possible to ask from a standard
> filesystem without doing a full scan first.

But you need to verify and update the DB contents - which requires
stat on all the files mentioned in the DB.  In other words you might
have to scan everything :-)

> The worse case problem by tackling renamed files and directories is
> when they are not only moved or renamed, but when they are also
> changed in contents.

In some ways that's equivalent to transferring one *very large* file
with small edits, efficiently.  Renames of small files map to
rearranging data in the large file.  Just as you don't want to read
and checksum all files in advance, you don't want to read and checksum
all of a very large file in advance.

Algorithms which improve very-large-file-with-small-edit performance
can be adapted to cover many-files-with-renames-and-edits, and vice
versa.

-- Jamie