[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
jorrit.jongma+rsync at gmail.com
Mon May 18 15:58:20 UTC 2020
Well, don't get too excited, get_checksum1() (the function optimized
here) is not the great performance limiter in this case, it's
get_checksum2() and sum_update(), which will be using MD5. You can
force using MD4, but on the slower CPU's I've tested in practice that
is slower rather than faster, contrary to what would be expected.
While this patch will improve things a little, to improve things a lot
we need to tackle or replace MD5.
Unfortunately, single stream MD5 cannot be effectively optimized with
SSE, at least I've not seen an SSE version faster than pure C, and
I've looked into it. What we _can_ do is parallelize multiple streams
using SSE, which may double to triple the throughput at the same CPU
load under ideal circumstances. However, this cannot be applied to
rsync as-is as it doesn't process multiple files simultaneously (and
it is questionable if that is something we should even want). The
single-file stream could still be parallelized this way but it would
require a slight change in checksum generation that would in turn
require a protocol change - both ends need to support it. At that
point we might as well swap MD5 out completely, though I will still be
digging deeper into this case.
The good news is that this parallelization _is_ possible in a drop-in
fashion for the case where rsync is comparing the chunks on both ends,
the same case where the get_checksum1() patch shows its benefits. I
estimate performance improvements could reach about 30% for that
specific case (re-transferring large yet slightly modified files), but
that does nothing for the performance of whole file checksumming or
the transfer of new files. Depending on your use-case you may never or
rarely even see that performance improvement in action. It applies for
my use-case though, so I am looking into this.
On Mon, May 18, 2020 at 5:18 PM Ben RUBSON via rsync
<rsync at lists.samba.org> wrote:
> On 18 May 2020, at 17:06, Jorrit Jongma via rsync <rsync at lists.samba.org> wrote:
> This drop-in patch increases the performance of the get_checksum1()
> function on x86-64.
> As ref, rather related to this : https://bugzilla.samba.org/show_bug.cgi?id=13082
> Thank you Jorrit !
> Please use reply-all for most replies to avoid omitting the mailing list.
> To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
More information about the rsync