Renamed files and directories

N.J. van der Horn (Nico) nico at vanderhorn.nl
Thu Feb 26 12:43:50 GMT 2009



Jamie Lokier schreef:
> N.J. van der Horn (Nico) wrote:
>   
>> The highest speed and efficiency is to only observe time and size as
>> then just a stat-call is needed.  But in more complex situations you
>> have to take also the checksum, inode-number, etc into account.  In
>> previous posts there were many ideas to cope with this. As rsync is
>> state-less regarding the filesystem, it needs to be extended by a DB
>> to hold the previous state of the observed filesystem.  The DB can
>> provide quickly files on many aspects, eg find a file by checksum or
>> other characteristic that is not possible to ask from a standard
>> filesystem without doing a full scan first.
>>     
>
> But you need to verify and update the DB contents - which requires
> stat on all the files mentioned in the DB.  In other words you might
> have to scan everything :-)
>   
This already takes place while Rsync does its job, so it has not to be 
done separately.

Adding a DB to Rsync would give many more advantages, like:
- de-duplication (eliminating copies)
- alternative to "locate"
- filesystem statistics/analysis
If the structure is choosen well, it can prove to be very valuable for 
other purposes also.

>> The worse case problem by tackling renamed files and directories is
>> when they are not only moved or renamed, but when they are also
>> changed in contents.
>>     
>
> In some ways that's equivalent to transferring one *very large* file
> with small edits, efficiently.  Renames of small files map to
> rearranging data in the large file.  Just as you don't want to read
> and checksum all files in advance, you don't want to read and checksum
> all of a very large file in advance.
>
> Algorithms which improve very-large-file-with-small-edit performance
> can be adapted to cover many-files-with-renames-and-edits, and vice
> versa.
>
> -- Jamie
>   
It must be possible to enable/disable checksumming when the timestamp 
and size are unchanged.
That clever trick is pretty reliable in normal Rsync usage as well and 
earns a lot of savings.
We only do once every while full checksummed Rsyncs to be sure, but see 
seldom transfers then.

Nico



More information about the rsync mailing list