About data/token send/receive protocol part and more

Andrey Gursky andrey.gursky at e-mail.ua
Sat Dec 26 04:05:19 UTC 2015

Dear rsync experts,

I'd like to ask you a couple of questions.

***** 1. *****
If I'd like to send additional metadata for each block, what would be the easiest and less intrusive way to do this? The following seems to work:
diff -rupN ../rsync.git/token.c ./token.c
--- ../rsync.git/token.c        2015-11-03 18:21:36.264183118 +0100
+++ ./token.c   2015-12-26 03:43:09.043841052 +0100
@@ -226,8 +226,12 @@ static int32 simple_recv_token(int f, ch
        if (residue == 0) {
                int32 i = read_int(f);
-               if (i <= 0)
+               if (i <= 0) {
+                       if (protocol_version >= 32) {
+                               int32 j = read_int(f); /* additional metadata */
+                       }
                        return i;
+               }
                residue = i;
@@ -252,8 +256,11 @@ static void simple_send_token(int f, int
        /* a -2 token means to send data only and no token */
-       if (token != -2)
+       if (token != -2) {
                write_int(f, -(token+1));
+               if (protocol_version >= 32)
+                       write_int(f, -(2*(token+1))); /* additional metadata */
+       }
 /* Flag bytes in compressed stream are encoded as follows: */

Is the protection with protocol_version enough or it could be done better?

***** 2. *****
Sending blocks in sequential manner is perfectly suited for cases, where:
1) nothing in the old target file is at the same place as in the new source and
2) no consecutive matches longer than 1024 bytes can be found.
But what if there are many blocks at the same place and there are matches of dozens of consecutive blocks? Then this approach is no longer efficient. How about (at least theoretically) to rework only this part of the protocol (sending/receiving literal/token data) in the manner like follows?

Condition (*): if occurs, try to accumulate as much as possible of consecutive blocks, then send (1,2,..).

 * no matched data
  1) type of transfer (e.g., 1)
  2) offset to start writing to
  3) data length
  4) literal data
 * matched data
  1) type of transfer (e.g., 2)
  2) offset to start writing to
  3) offset to start reading from
  4) data length
 * data at the same offset
  1) type of transfer (e.g., 3)
  2) start offset
  3) data length
 * no data at all (zeros/holes)
  1) type of transfer (e.g., 4)
  2) start offset
  3) data length
The type of the transfer would be a single byte. Offsets and data length fields would of 64bit length. Or the versioning could be split in two parts, and an additional version of this protocol part will prepend the type of transfer.

What do you think? Theoretically and maybe practically? Due to protocol versioning it should be possible to change the protocol in an arbitrary fashion, since old code for the previous versions remains.

***** 3. *****
In order to test such changes for regressions, some tricky corner cases could occur, e.g. as described in "Extra writes with --inplace due to misaligned block matching" (https://bugzilla.samba.org/show_bug.cgi?id=7778)
Are there any tests for this? Or how has it been debugged to ensure the issue is fixed?

***** 4. *****
Am I right, that due to sending the offset as int32, rsync is limited in synchronizing files of max 2TB (using default block size of 1024 bytes)? It sounds huge, but it is not actually anymore nowadays. Thus adding an additional argument for protocol update.

***** 5. *****
The traffic on the rsync mailing list dropped almost to zero. Indeed there are hundreds of opened bugs that got no attention, some important features missing and there is almost no development on this classic tool. Or legacy? Maybe a successor of rsync is already there being actively developed and maintained and I'm not yet aware of this? (Which renders attempts to tweak rsync obsolete.)

Thanks in advance,

More information about the rsync mailing list