[PATCH] SSE2/SSSE3 optimized version of get_checksum1() for x86-64
jorrit.jongma+rsync at gmail.com
Mon May 18 22:07:32 UTC 2020
Unfortunately we can't "always build" the SSSE3 code. It won't even
build unless the "-mssse3" flag is presented to GCC.
We don't want to build the entire project with this flag enabled, as
it might trigger SSSE3 optimizations outside of our runtime decided
code path that may break on CPUs that do not support it.
A suggestion found online was isolating the SSSE3 version in
"checksum_ssse3.c" and compiling only that file with "-mssse3", but
some searching around has led me to reports from developers who had
even that setup cause issues with code shared between SSE and non-SSE
objects. I think that risk is low for this case though, as we're just
doing some math and not passing anything but integers and pointers. I
wouldn't think twice about enabling it that way on one of my pet
projects, but a project as widespread as rsync should not have that in
its codebase unless we're _absolutely_ sure it doesn't cause problems
for _anybody_. A very small risk of issues times many millions of
users equals guaranteed failure.
But even if we use that method it requires modifications to the build
scripts (check for x86-64 and exclude otherwise, present file-specific
flags) that are beyond my experience with this build setup.
> My suggestion would be to have a get_checksum1_sse2() and
> get_checksum1_sse3() and always build them. The compiler should support
> it. Then on runtime you would check for sse3 and based on the result
> get_checksum1() would either invoke the _sse2() or sse3().
> Without auto detection it won't be utilized by distros. But yes, this
> could be improved afterwards.
More information about the rsync