Revisiting two old issues
wayned at samba.org
Fri May 9 05:56:31 EST 2003
I'd like some opinions on a couple of long-standing rsync issues. My
two oldest, uncommitted patches are:
- A "no hang" patch that makes sure that the pipe from the receiver
to the generator can't block with resend requests.
- The "move files" patch that got changed into a --delete-sent-files
For each item I have two questions -- do we need to deal with this?
And is the proposed change a good way to implement the change? Some
comments on each item follow:
Redo-Channel Anti-Hang Patch
I've had a couple different incarnations of this patch because the IO
section is quite complex and there were concerns about memory usage (if
rsync keeps the redo channel clear, it has to note the redo items
somewhere). My first one kept a buffer of redo items that would expand
only as redo items arrived (Red Hat actually incorporated this one into
their released version of rsync 2.4.6 at some point, so it even got a
lot of testing, unbeknownst to me at the time). My later patch changed
the no-hang algorithm to keep a flag char for every item in the file
list (to avoid a criticism about having a growing buffer). I actually
prefer the first patch these days, because it has a lower memory impact
on large file lists. Anyone have an opinion on this?
More technical comments: I don't believe that we should use the
existing per-item flags in the flist data since it is shared between
two forked processes, and twiddling data in copy-on-write memory may
well cause a larger memory bloat than just keeping a separate flag
array. Of course if we eventually switch over to threads, this
copy-on-write pitfall would go away.
Another alternative implementation would be to change the rsync
algorithm to recycle the redo items immediately instead of in a
separate pass. This would eliminate the need for caching the redo
data. (Aside: have the recent checksum-length changes taken into
account the redo pass that tries to use an alternate checksum size
for the resends?)
I'm currently debating with myself whether I believe that rsync should
get a --delete-sent-files option. If you think of rsync as a "keeping
things in sync" program, removing a file after transferring it seems to
be a little outside the realm of rsync's purpose. However, if you think
of rsync as a feature-rich and more bandwidth-efficient copy tool, then
having the ability to move files between machines as well as copy them
seems like an appropriate addition. I certainly need to be able to move
files between systems at my work, and I haven't seen a better tool for
this than rsync with one of my old move-files patches applied to it.
I'd love to hear what people think about this issue.
Implementing this is interesting. The best way to do it is for the
receiver to send a success message back to the sender so that it can
remove the source file only when it has been successfully sent. One
implementation was to use the redo channel for this ack, and that means
that the above no-hang patch would need to be implemented first. An
untried implementation would be to use the error channel (from the
receiver through the generator to the sender) as a way to send the
sender a "delete item X" message.
An alternate approach that is conceptually simpler is to add a file-
removal pass on the sending side at the end of the transfer, but I have
grown more doubtful over time that this method would properly handle
error conditions in a reasonable manner (since we want to avoid both
erasing a file that didn't get sent and leaving a file unremoved that
did get sent).
More information about the rsync