[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64

Jorrit Jongma jorrit.jongma+rsync at gmail.com
Mon May 18 17:02:59 UTC 2020


I think you're missing a point here. Two different checksum algorithms
are used in concert, the Adler-based one and the MD5 one. I
SSE-optimized the Adler-based one. The Adler-based hash is used to
_find_ blocks that might have shifted, while the MD5 hash is a strong
cryptographic hash used to _verify_ blocks and files. You wouldn't
want to replace the MD5 hash with the Adler-based hash, they are of a
different class. If you'd replace the MD5 hash with a different one,
you'd replace it with one of the SHA's or even xxHash.

On Mon, May 18, 2020 at 6:21 PM Ben RUBSON via rsync
<rsync at lists.samba.org> wrote:
>
> Thank you Jorrit for your detailed answer.
>
> > On 18 May 2020, at 17:58, Jorrit Jongma via rsync <rsync at lists.samba.org> wrote:
> >
> > Well, don't get too excited, get_checksum1() (the function optimized
> > here) is not the great performance limiter in this case, it's
> > get_checksum2() and sum_update(), which will be using MD5.
>
> Certainly that all other functions using MD5 could be updated to use your SSE-optimized function.
> So that we have a full SSE MD5 support, wherever rsync is using it (basis file checksum, rolling checksum etc...).
>
> I think one nice performance improvement could be when the receiver checksums the (big/huge) basis file, because here the sender is then simply waiting...
>
> > Unfortunately, single stream MD5 cannot be effectively optimized with
> > SSE, at least I've not seen an SSE version faster than pure C
>
> I was about to tell you that we successfully implemented it into FreeBSD a few years ago, but it's CRC32, not MD5...
> https://github.com/freebsd/freebsd/commit/c4b27423f57c30068aff3f234c912ae8d9ff1b6a
> https://github.com/freebsd/freebsd/commit/5a798b035b4858923878c014a5faa48b2f9aa6e7
> At least sounds like the algorithm author / inspiration, Mark Adler, is the same :)
>
> Anyway, this is a first interesting SSE MD5 support.
> --
> Please use reply-all for most replies to avoid omitting the mailing list.
> To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
> Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html



More information about the rsync mailing list