rsync very slow with large include/exclude file list

ray vantassle rayvantassle at gmail.com
Mon Jun 15 17:02:14 MDT 2015


I investigated the rsync code and found the reason why.
For every file in the source, it searches the entire filter-list looking to
see if that filename is on the exclude/include list.  Most aren't, so it
compares (350K - 72K) * 72K names (the non-listed files) plus (72K * 72K/2)
names (the ones that are listed), for a total of about  22,608,000,000
strcmp's.  That's 22 BILLION comparisons. (I may have left off a zero
there, it might be 220 B).

I'm working on a fix to improve this.  The first phase was to just improve
the existing code without changing the methodology.
The set I've been testing with is local-local machine, dry-run, 216K files
in the source directory, 25,000 files in the exclude-from list.
The original rsync takes 488 seconds.
The improved code takes 300 seconds.

The next phase was to improve the algorithm of handling large
filter_lists.  Change the unsorted linear search to a sorted binary search
(skiplist).
This improved code takes 2 seconds.

The original code does 4,492,304,682 strcmp's.
The fully improved code does 6,472,564.  98.5% fewer.

I am cleaning up the code and will submit a patchfile soon.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.samba.org/pipermail/rsync/attachments/20150615/f0692795/attachment.html>


More information about the rsync mailing list