Moved/Renamed Files

Ming Zhang blackmagic02881 at gmail.com
Fri Jan 4 21:29:37 GMT 2008


On Fri, 2008-01-04 at 16:21 -0500, Boris Toloknov wrote:
> Ming Zhang wrote: 
> > On Fri, 2008-01-04 at 15:05 -0500, Boris Toloknov wrote:
> >   
> > > Ming Zhang wrote: 
> > >     
> > > > On Fri, 2008-01-04 at 14:12 -0500, Boris Toloknov wrote:
> > > >   
> > > >       
> > > > > Ming Zhang wrote: 
> > > > >     
> > > > >         
> > > > > > On Thu, 2008-01-03 at 20:19 -0500, Boris Toloknov wrote:
> > > > > >   
> > > > > >       
> > > > > >           
> > > > > > > Hi,
> > > > > > > It seems that rsync transfers files whose names was changed or which
> > > > > > > were moved to another directory since the previous synchronization. I
> > > > > > > think that ability not to transfer (large) files which are present on
> > > > > > > another computer would be very helpful. Right before rsync is going to
> > > > > > > transfer some large file it could check if there some other files with
> > > > > > > the same size ( and maybe the same mtime ) on the destination
> > > > > > > computer. In case if the destination computer has such files then it
> > > > > > > could be asked to find the file with given MD5. If it's found then
> > > > > > > there is no need to transfer that file. Local copy/rename/move can be
> > > > > > > performed instead.
> > > > > > >     
> > > > > > >         
> > > > > > >             
> > > > > > let us say you have N files in one directory and you rename the
> > > > > > directory name. so for N files, u need to check destination side all M
> > > > > > files and see if it is the renamed one. so you do NxM comparison and
> > > > > > this is not scalable at all...
> > > > > >   
> > > > > >       
> > > > > >           
> > > > > I think that a hash could be used instead of that. The destination
> > > > > computer ( at least ) must has a list of all the files in the
> > > > > destination directory. The key = size + mtime and value = pointer to
> > > > > the file entry in the list. Actually for that operation it would be
> > > > > better to have that list and hash on the sending computer.
> > > > >     
> > > > >         
> > > > rsync 3.0 introduce incremental scan to avoid the OOM issue, so hash
> > > > need to be optional as well... also i think this hash can be used to
> > > > detect hard link at same time. for normal use, it should be ok.
> > > >   
> > > >       
> > > I agree that with incremental scan "move/rename" feature can be
> > > optional. Anyway to minimize memory usage ( if it's necessary ) a
> > > sorted list can be used instead of hash and a list of all files could
> > > be stored in the temporary file with buffered access to it. In that
> > > case the key = size + mtime, value = offset in the file with the list.
> > >     
> > 
> > another issue is rsync need to build this list up front before handling
> > file transfer. this can take quite some time on a huge file system (when
> > i say huge, i mean the file system with 20-100m files)...
> > 
> > also rsync already have some rename detection. check command line option
> > please.
> >   
> I don't mind to have "move/rename" detection as an optional feature
> that is turned off by default. Actually that list doesn't have to have
> all the files. The files with size < some configurable size ( for
> example 100KB ) don't need to be in the list. So it's likely won't
> take much memory and time ( for sorting ) even for huge systems.
> Scanning of the file tree takes some time though. 1TB HDD filled up
> with 100,000,000 files has average file size about 10KB.
> I have 2.6.9 and didn't find any command line option for rename
> detection. I just found that there is some patch "--detect-renamed".
> But it seems that that patch doesn't detect files which were moved to
> another directory. "News file" for 3.0.0pre7 doesn't have anything
> about rename detection.

i must remember the feature because of this patch.

another way is to use inotify, generate a moved file list, pass list to
receiver side, and handle the list before running rsync.


> 
> Boris
> 
> Boris
-- 
Ming Zhang


@#$%^ purging memory... (*!%
http://blackmagic02881.wordpress.com/
http://www.linkedin.com/in/blackmagic02881
--------------------------------------------



More information about the rsync mailing list