proposal to speed rsync with lots of files

Peter Salameh psalameh at ucsd.edu
Thu Mar 5 23:27:50 GMT 2009


Hello,

I have followed the discussion of speeding up rsync when there are lots 
of files, and I have a proposal which I think would greatly speed rsync 
when doing routine mirroring of large filesystems.

One of the speed-limiting issues with rsync is having to send huge file 
lists when mirroring large file systems, even for incremental updates 
where only a small part of the file system might have changed.  My 
proposal is to first send a checksum of the file list for each 
directory.  If is found to be identical to the same checksum on the 
remote side then the list need not be sent for that directory!  That 
would reduce the size of the file list greatly when there are 
directories containing many files which do not change from on rsync to 
the next.

Here's an example:

              remote                            local
              dir1                                 dir1      -  file 
list checksum same as on remote       -> don't send file list for dir1
              dir2                                 dir2      -  file 
list checksum same as on remote       -> don't send file list for dir2
              dir3                                 dir3      -  file 
list checksum different from remote  -> send file list for dir3

It might even be possible to use the rsync checksum algorithm on the 
directory lists themselves to determine which portion of the directory 
lists to send, in the case of directories which nearly identical.

I would appreciate hearing from rsync developers if this feasible with 
the current implementation and if they think it would help.

Thanks,

Peter Salameh




More information about the rsync mailing list