Moved/Renamed Files

Boris Toloknov tlknv at yandex.ru
Fri Jan 4 21:41:05 GMT 2008


Ming Zhang wrote:
> On Fri, 2008-01-04 at 16:21 -0500, Boris Toloknov wrote:
>   
>> Ming Zhang wrote: 
>>     
>>> On Fri, 2008-01-04 at 15:05 -0500, Boris Toloknov wrote:
>>>   
>>>       
>>>> Ming Zhang wrote: 
>>>>     
>>>>         
>>>>> On Fri, 2008-01-04 at 14:12 -0500, Boris Toloknov wrote:
>>>>>   
>>>>>       
>>>>>           
>>>>>> Ming Zhang wrote: 
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>> On Thu, 2008-01-03 at 20:19 -0500, Boris Toloknov wrote:
>>>>>>>   
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>>>> Hi,
>>>>>>>> It seems that rsync transfers files whose names was changed or which
>>>>>>>> were moved to another directory since the previous synchronization. I
>>>>>>>> think that ability not to transfer (large) files which are present on
>>>>>>>> another computer would be very helpful. Right before rsync is going to
>>>>>>>> transfer some large file it could check if there some other files with
>>>>>>>> the same size ( and maybe the same mtime ) on the destination
>>>>>>>> computer. In case if the destination computer has such files then it
>>>>>>>> could be asked to find the file with given MD5. If it's found then
>>>>>>>> there is no need to transfer that file. Local copy/rename/move can be
>>>>>>>> performed instead.
>>>>>>>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>>> let us say you have N files in one directory and you rename the
>>>>>>> directory name. so for N files, u need to check destination side all M
>>>>>>> files and see if it is the renamed one. so you do NxM comparison and
>>>>>>> this is not scalable at all...
>>>>>>>   
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>> I think that a hash could be used instead of that. The destination
>>>>>> computer ( at least ) must has a list of all the files in the
>>>>>> destination directory. The key = size + mtime and value = pointer to
>>>>>> the file entry in the list. Actually for that operation it would be
>>>>>> better to have that list and hash on the sending computer.
>>>>>>     
>>>>>>         
>>>>>>             
>>>>> rsync 3.0 introduce incremental scan to avoid the OOM issue, so hash
>>>>> need to be optional as well... also i think this hash can be used to
>>>>> detect hard link at same time. for normal use, it should be ok.
>>>>>   
>>>>>       
>>>>>           
>>>> I agree that with incremental scan "move/rename" feature can be
>>>> optional. Anyway to minimize memory usage ( if it's necessary ) a
>>>> sorted list can be used instead of hash and a list of all files could
>>>> be stored in the temporary file with buffered access to it. In that
>>>> case the key = size + mtime, value = offset in the file with the list.
>>>>     
>>>>         
>>> another issue is rsync need to build this list up front before handling
>>> file transfer. this can take quite some time on a huge file system (when
>>> i say huge, i mean the file system with 20-100m files)...
>>>
>>> also rsync already have some rename detection. check command line option
>>> please.
>>>   
>>>       
>> I don't mind to have "move/rename" detection as an optional feature
>> that is turned off by default. Actually that list doesn't have to have
>> all the files. The files with size < some configurable size ( for
>> example 100KB ) don't need to be in the list. So it's likely won't
>> take much memory and time ( for sorting ) even for huge systems.
>> Scanning of the file tree takes some time though. 1TB HDD filled up
>> with 100,000,000 files has average file size about 10KB.
>> I have 2.6.9 and didn't find any command line option for rename
>> detection. I just found that there is some patch "--detect-renamed".
>> But it seems that that patch doesn't detect files which were moved to
>> another directory. "News file" for 3.0.0pre7 doesn't have anything
>> about rename detection.
>>     
>
> i must remember the feature because of this patch.
>
> another way is to use inotify, generate a moved file list, pass list to
> receiver side, and handle the list before running rsync.
>   
Of course there are many ways to handle move/rename without rsync. 
However that isn't very easy and I think that "move/rename" detection 
would be helpful for many/most rsync users.

Boris

-------------- next part --------------
HTML attachment scrubbed and removed


More information about the rsync mailing list