Renamed files and directories
N.J. van der Horn (Nico)
nico at vanderhorn.nl
Thu Feb 26 12:43:50 GMT 2009
Jamie Lokier schreef:
> N.J. van der Horn (Nico) wrote:
>
>> The highest speed and efficiency is to only observe time and size as
>> then just a stat-call is needed. But in more complex situations you
>> have to take also the checksum, inode-number, etc into account. In
>> previous posts there were many ideas to cope with this. As rsync is
>> state-less regarding the filesystem, it needs to be extended by a DB
>> to hold the previous state of the observed filesystem. The DB can
>> provide quickly files on many aspects, eg find a file by checksum or
>> other characteristic that is not possible to ask from a standard
>> filesystem without doing a full scan first.
>>
>
> But you need to verify and update the DB contents - which requires
> stat on all the files mentioned in the DB. In other words you might
> have to scan everything :-)
>
This already takes place while Rsync does its job, so it has not to be
done separately.
Adding a DB to Rsync would give many more advantages, like:
- de-duplication (eliminating copies)
- alternative to "locate"
- filesystem statistics/analysis
If the structure is choosen well, it can prove to be very valuable for
other purposes also.
>> The worse case problem by tackling renamed files and directories is
>> when they are not only moved or renamed, but when they are also
>> changed in contents.
>>
>
> In some ways that's equivalent to transferring one *very large* file
> with small edits, efficiently. Renames of small files map to
> rearranging data in the large file. Just as you don't want to read
> and checksum all files in advance, you don't want to read and checksum
> all of a very large file in advance.
>
> Algorithms which improve very-large-file-with-small-edit performance
> can be adapted to cover many-files-with-renames-and-edits, and vice
> versa.
>
> -- Jamie
>
It must be possible to enable/disable checksumming when the timestamp
and size are unchanged.
That clever trick is pretty reliable in normal Rsync usage as well and
earns a lot of savings.
We only do once every while full checksummed Rsyncs to be sure, but see
seldom transfers then.
Nico
More information about the rsync
mailing list