Moved/Renamed Files

Boris Toloknov tlknv at yandex.ru
Fri Jan 4 21:21:53 GMT 2008


Ming Zhang wrote:
> On Fri, 2008-01-04 at 15:05 -0500, Boris Toloknov wrote:
>   
>> Ming Zhang wrote: 
>>     
>>> On Fri, 2008-01-04 at 14:12 -0500, Boris Toloknov wrote:
>>>   
>>>       
>>>> Ming Zhang wrote: 
>>>>     
>>>>         
>>>>> On Thu, 2008-01-03 at 20:19 -0500, Boris Toloknov wrote:
>>>>>   
>>>>>       
>>>>>           
>>>>>> Hi,
>>>>>> It seems that rsync transfers files whose names was changed or which
>>>>>> were moved to another directory since the previous synchronization. I
>>>>>> think that ability not to transfer (large) files which are present on
>>>>>> another computer would be very helpful. Right before rsync is going to
>>>>>> transfer some large file it could check if there some other files with
>>>>>> the same size ( and maybe the same mtime ) on the destination
>>>>>> computer. In case if the destination computer has such files then it
>>>>>> could be asked to find the file with given MD5. If it's found then
>>>>>> there is no need to transfer that file. Local copy/rename/move can be
>>>>>> performed instead.
>>>>>>     
>>>>>>         
>>>>>>             
>>>>> let us say you have N files in one directory and you rename the
>>>>> directory name. so for N files, u need to check destination side all M
>>>>> files and see if it is the renamed one. so you do NxM comparison and
>>>>> this is not scalable at all...
>>>>>   
>>>>>       
>>>>>           
>>>> I think that a hash could be used instead of that. The destination
>>>> computer ( at least ) must has a list of all the files in the
>>>> destination directory. The key = size + mtime and value = pointer to
>>>> the file entry in the list. Actually for that operation it would be
>>>> better to have that list and hash on the sending computer.
>>>>     
>>>>         
>>> rsync 3.0 introduce incremental scan to avoid the OOM issue, so hash
>>> need to be optional as well... also i think this hash can be used to
>>> detect hard link at same time. for normal use, it should be ok.
>>>   
>>>       
>> I agree that with incremental scan "move/rename" feature can be
>> optional. Anyway to minimize memory usage ( if it's necessary ) a
>> sorted list can be used instead of hash and a list of all files could
>> be stored in the temporary file with buffered access to it. In that
>> case the key = size + mtime, value = offset in the file with the list.
>>     
>
> another issue is rsync need to build this list up front before handling
> file transfer. this can take quite some time on a huge file system (when
> i say huge, i mean the file system with 20-100m files)...
>
> also rsync already have some rename detection. check command line option
> please.
>   
I don't mind to have "move/rename" detection as an optional feature that 
is turned off by default. Actually that list doesn't have to have all 
the files. The files with size < some configurable size ( for example 
100KB ) don't need to be in the list. So it's likely won't take much 
memory and time ( for sorting ) even for huge systems. Scanning of the 
file tree takes some time though. 1TB HDD filled up with 100,000,000 
files has average file size about 10KB.
I have 2.6.9 and didn't find any command line option for rename 
detection. I just found that there is some patch "--detect-renamed". But 
it seems that that patch doesn't detect files which were moved to 
another directory. "News file" for 3.0.0pre7 doesn't have anything about 
rename detection.

Boris

Boris
-------------- next part --------------
HTML attachment scrubbed and removed


More information about the rsync mailing list