Moved/Renamed Files

Ming Zhang blackmagic02881 at gmail.com
Fri Jan 4 20:22:27 GMT 2008


On Fri, 2008-01-04 at 15:05 -0500, Boris Toloknov wrote:
> Ming Zhang wrote: 
> > On Fri, 2008-01-04 at 14:12 -0500, Boris Toloknov wrote:
> >   
> > > Ming Zhang wrote: 
> > >     
> > > > On Thu, 2008-01-03 at 20:19 -0500, Boris Toloknov wrote:
> > > >   
> > > >       
> > > > > Hi,
> > > > > It seems that rsync transfers files whose names was changed or which
> > > > > were moved to another directory since the previous synchronization. I
> > > > > think that ability not to transfer (large) files which are present on
> > > > > another computer would be very helpful. Right before rsync is going to
> > > > > transfer some large file it could check if there some other files with
> > > > > the same size ( and maybe the same mtime ) on the destination
> > > > > computer. In case if the destination computer has such files then it
> > > > > could be asked to find the file with given MD5. If it's found then
> > > > > there is no need to transfer that file. Local copy/rename/move can be
> > > > > performed instead.
> > > > >     
> > > > >         
> > > > let us say you have N files in one directory and you rename the
> > > > directory name. so for N files, u need to check destination side all M
> > > > files and see if it is the renamed one. so you do NxM comparison and
> > > > this is not scalable at all...
> > > >   
> > > >       
> > > I think that a hash could be used instead of that. The destination
> > > computer ( at least ) must has a list of all the files in the
> > > destination directory. The key = size + mtime and value = pointer to
> > > the file entry in the list. Actually for that operation it would be
> > > better to have that list and hash on the sending computer.
> > >     
> > 
> > rsync 3.0 introduce incremental scan to avoid the OOM issue, so hash
> > need to be optional as well... also i think this hash can be used to
> > detect hard link at same time. for normal use, it should be ok.
> >   
> I agree that with incremental scan "move/rename" feature can be
> optional. Anyway to minimize memory usage ( if it's necessary ) a
> sorted list can be used instead of hash and a list of all files could
> be stored in the temporary file with buffered access to it. In that
> case the key = size + mtime, value = offset in the file with the list.

another issue is rsync need to build this list up front before handling
file transfer. this can take quite some time on a huge file system (when
i say huge, i mean the file system with 20-100m files)...

also rsync already have some rename detection. check command line option
please.


> 
> Boris
-- 
Ming Zhang


@#$%^ purging memory... (*!%
http://blackmagic02881.wordpress.com/
http://www.linkedin.com/in/blackmagic02881
--------------------------------------------



More information about the rsync mailing list