proposal to speed rsync with lots of files
Peter Salameh
psalameh at ucsd.edu
Thu Mar 5 23:27:50 GMT 2009
Hello,
I have followed the discussion of speeding up rsync when there are lots
of files, and I have a proposal which I think would greatly speed rsync
when doing routine mirroring of large filesystems.
One of the speed-limiting issues with rsync is having to send huge file
lists when mirroring large file systems, even for incremental updates
where only a small part of the file system might have changed. My
proposal is to first send a checksum of the file list for each
directory. If is found to be identical to the same checksum on the
remote side then the list need not be sent for that directory! That
would reduce the size of the file list greatly when there are
directories containing many files which do not change from on rsync to
the next.
Here's an example:
remote local
dir1 dir1 - file
list checksum same as on remote -> don't send file list for dir1
dir2 dir2 - file
list checksum same as on remote -> don't send file list for dir2
dir3 dir3 - file
list checksum different from remote -> send file list for dir3
It might even be possible to use the rsync checksum algorithm on the
directory lists themselves to determine which portion of the directory
lists to send, in the case of directories which nearly identical.
I would appreciate hearing from rsync developers if this feasible with
the current implementation and if they think it would help.
Thanks,
Peter Salameh
More information about the rsync
mailing list