[Bug 14798] New: Metadata traffic --- uncompressed with -z, interaction with --bwlimit and ssh compression

samba-bugs at samba.org samba-bugs at samba.org
Tue Aug 17 09:22:06 UTC 2021


https://bugzilla.samba.org/show_bug.cgi?id=14798

            Bug ID: 14798
           Summary: Metadata traffic --- uncompressed with -z, interaction
                    with --bwlimit and ssh compression
           Product: rsync
           Version: 3.1.3
          Hardware: All
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P5
         Component: core
          Assignee: wayne at opencoder.net
          Reporter: zero at smallinteger.com
        QA Contact: rsync-qa at samba.org
  Target Milestone: ---

Consider the case where rsync is tasked to synchronize a large file set in
which there are few changes.  Anecdotal evidence (duckduckgo search) suggests
most of the network traffic will be spent exchanging file metadata, rather than
file content, as intended.  The same anecdotal evidence suggests this "file
list" is not exchanged in compressed form between rsync's endpoints, even when
using the -z switch.  This seems accurate: setting up a suitable experiment
shows ssh compression reduces overall bandwidth usage by roughly 2x in these
cases.  This seems an opportunity for improvement.

The benefits would be compounded when using --bwlimit.  In this case, disabling
ssh compression results in traffic that respects the requested shape.  However,
this traffic is measured at the rsync endpoints.  Consequently, rsync will not
use the available bandwidth effectively, precisely because in this use case
there are very few file changes in the file set (which is the point of using
rsync).

Note that since ssh compression is unpredictable, adequately adjusting
--bwlimit for maximum efficiency is impossible.  Thus, bandwidth usage will be
optimal without -z (but with redundant traffic without ssh or rsync
compression), or suboptimal with or without -z and --bwlimit (due to ssh
compressing file metadata without rsync realizing).  In these cases, the time
required for rsync to complete the task remains unchanged regardless of the
form of compression.

Would it be possible to rsync's -z switch to set up the equivalent of two
compressed streams, one for file data, another for file metadata, which are
then multiplexed over the wire?  In that way, ssh compression would be entirely
unnecessary, and --bwlimit would still result in maximum efficiency even when
most traffic is file metadata.  Having rsync compress the file list is likely
to result in better compression than ssh could achieve because the shape of the
file metadata will be known to rsync.

I could not find previous bug reports on this specific issue in the bug
database --- I searched for bugs related to -z and --bwlimit, and I also
searched through the release notes in case this (or an equivalent) enhancement
has been applied recently.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.


More information about the rsync mailing list