Patch: Offline transfer mode
Steve Traugott
stevegt at TerraLuna.Org
Mon Mar 21 04:46:20 GMT 2005
Hi All,
Here's an rsync patch which adds an --offline flag, letting you transfer
changed blocks via removable media, while still comparing checksums via
the net. I expect this could be very popular for the growing number of
people who want to do disk-based offsite backups, which is what I needed
it for.
It took me longer than I hoped, but still only several hours to work
this out -- it turned out to be easy enough. Batch mode did most of
what I wanted; I just needed to keep --write-batch from also sending the
changed blocks over the net. It was harder than it could have been;
because the entire protocol goes through write_fd(), I couldn't just
shut off all traffic there, but instead had to redirect unwanted traffic
to /dev/null higher up, in send_files(). A more graceful way, I think,
might have been to create a global (say, net_quench) which is set from
send_files() and read from write_fd(), to toggle traffic on and off
depending on context. I also had to handle deleted files correctly, but
that was just a simple addition to do_unlink() and server_options().
This patch is against 2.6.3, but I just now checked CVS HEAD, and I see
that, while the patch won't apply directly to HEAD, none of the changes
are great enough to make porting it forward a chore. I'll be happy to
provide a patch against HEAD if interest is there (and it's in my own
best interests to get this into mainstream and to help maintain it
afterwards).
Here's an excerpt from the patched man page:
________________________________________________________________________
OFFLINE MODE
"Never underestimate the bandwidth of a station wagon full of tapes
hurtling down the highway." -- Andrew Tanenbaum
Note: Offline mode should be considered experimental in this version
of rsync. It depends on batch mode, which is itself experimental. The
interface and behavior may not be stable yet. Having said that, the
original developer of offline mode uses the code for production
offsite backups, so it is likely to be actively maintained.
See the ``BATCH MODE'' section before continuing.
Offline mode (--offline) is useful when you have enough network
bandwidth to compare checksums, but not enough to carry the actual
data blocks.
The drawback of ordinary batch mode is that --write-batch not only
creates the local batch file, but also sends all changed bytes over
the network. Adding the --offline flag to --write-batch prevents the
changed bytes from being sent over the network, while still creating
the local batch file. You can then hand-carry the batch file, on
removable media, to the remote site, and use --read-batch to update
the remote tree.
Offline mode correctly handles new and changed files as well as
deletions (--delete). Compression (-z) works fine, and produces a
compressed batch file (just remember to include a -z with --read-batch
at the remote end). It is safe to create a batch file multiple times
without applying it to the destination. It is safe to lose the
removable media and the batch file it contains; just run with
--offline again to regenerate it. It even appears safe to accidentally
try to apply the same batch file to the destination multiple times --
as of this writing we haven't seen this cause problems beyond the
warning messages this generates.
Example:
Let's say you have a terabyte of local disk, full of changing
production data. You need to maintain an offsite copy of this data in
case the building burns down. You don't want to buy a tape robot or
manage dozens of tapes or removable disk drives, and you don't want to
pay a lot for network bandwidth to a remote mirror site.
So you plug a single, cheap, 250 gigabyte external USB drive into your
local machine. You install a terabyte of cheap disk in the remote
machine, connect the remote machine to the Internet with a slow DSL
line, and run this on the local machine (assume the USB drive is
mounted as /mnt/usb):
local $ rsync -e ssh --offline --write-batch=/mnt/usb/foo \
-az /sourcedir/ remote:/destdir/
You then use the station wagon (or FedEx) to carry the USB drive to
the remote location. Plug it into the remote machine, mount it as
/mnt/usb, and say:
remote $ rsync --read-batch=/mnt/usb/foo \
-az /destdir/
You can add --delete to the above commands as well.
You might need to do this write/carry/read round-trip a few times to
get the remote machine fully updated the first time -- you'll know
this is the case if your USB drive is filling up. For this offsite
backup example it is relatively safe to allow the removable media to
fill up; just apply the (truncated) batch file to the destination,
then generate a new (shorter) batch file, and repeat the cycle until
the destination is fully up to date and the USB drive stops filling
each time. While this is going on, the remote machine will only be
partially updated, just as if you cut off an rsync transfer over the
net in mid-stream, so keep this in mind.
Once the initial update is done, you'll likely find that the batch
file size is quite small each trip, depending on the change rate of
your data and how often you update. For most organizations a single
250 gigabyte USB drive should be much more than enough to do weekly
offsite backups of more than a terabyte of data.
For simplicity, this example ignores weekly, monthly, and yearly
archives. You'll want those -- depending on your needs, you should in
most cases be able to maintain them on your terabyte of disk,
alongside the live data, with local rsync runs: see the --link-dest
flag.
________________________________________________________________________
Index: rsync-2.6.3/sender.c
===================================================================
--- rsync-2.6.3.orig/sender.c 2004-09-19 21:17:42.%N -0700
+++ rsync-2.6.3/sender.c 2005-03-19 12:26:03.%N -0800
@@ -29,6 +29,9 @@
extern int protocol_version;
extern int make_backups;
extern struct stats stats;
+extern int write_batch;
+extern int offline;
+extern int write_batch_monitor_out;
/**
@@ -123,16 +126,27 @@
struct stats initial_stats;
int save_make_backups = make_backups;
int j;
+ int f_alt, old_monitor;
if (verbose > 2)
rprintf(FINFO, "send_files starting\n");
+ old_monitor = write_batch_monitor_out;
+ if (write_batch && offline) {
+ /* offline send: don't send tokens over wire */
+ f_alt = open("/dev/null", O_RDWR);
+ write_batch_monitor_out = f_alt;
+ } else {
+ f_alt = f_out;
+ }
+
while (1) {
unsigned int offset;
i = read_int(f_in);
if (i == -1) {
if (phase == 0) {
+ write_batch_monitor_out = old_monitor;
phase++;
csum_length = SUM_LENGTH;
write_int(f_out, -1);
@@ -173,7 +187,7 @@
if (dry_run) {
if (!am_server && verbose) /* log the transfer */
rprintf(FINFO, "%s\n", safe_fname(fname2));
- write_int(f_out, i);
+ write_int(f_alt, i);
continue;
}
@@ -224,8 +238,8 @@
safe_fname(fname), (double)st.st_size);
}
- write_int(f_out, i);
- write_sum_head(f_out, s);
+ write_int(f_alt, i);
+ write_sum_head(f_alt, s);
if (verbose > 2) {
rprintf(FINFO, "calling match_sums %s\n",
@@ -237,7 +251,7 @@
set_compression(fname);
- match_sums(f_out, s, mbuf, st.st_size);
+ match_sums(f_alt, s, mbuf, st.st_size);
log_send(file, &initial_stats);
if (mbuf) {
Index: rsync-2.6.3/options.c
===================================================================
--- rsync-2.6.3.orig/options.c 2005-03-18 17:19:30.%N -0800
+++ rsync-2.6.3/options.c 2005-03-19 12:45:22.%N -0800
@@ -113,6 +113,7 @@
int write_batch = 0;
int read_batch = 0;
+int offline = 0;
int backup_dir_len = 0;
int backup_suffix_len;
unsigned int backup_dir_remainder;
@@ -304,6 +305,7 @@
rprintf(F," --bwlimit=KBPS limit I/O bandwidth, KBytes per second\n");
rprintf(F," --write-batch=FILE write a batch to FILE\n");
rprintf(F," --read-batch=FILE read a batch from FILE\n");
+ rprintf(F," --offline make changes via batch file only\n");
rprintf(F," --checksum-seed=NUM set block/file checksum seed\n");
#ifdef INET6
rprintf(F," -4, --ipv4 prefer IPv4\n");
@@ -401,6 +403,7 @@
{"hard-links", 'H', POPT_ARG_NONE, &preserve_hard_links, 0, 0, 0 },
{"read-batch", 0, POPT_ARG_STRING, &batch_name, OPT_READ_BATCH, 0, 0 },
{"write-batch", 0, POPT_ARG_STRING, &batch_name, OPT_WRITE_BATCH, 0, 0 },
+ {"offline", 0, POPT_ARG_NONE, &offline, 0, 0, 0 },
{"files-from", 0, POPT_ARG_STRING, &files_from, 0, 0, 0 },
{"from0", '0', POPT_ARG_NONE, &eol_nulls, 0, 0, 0},
{"no-implied-dirs", 0, POPT_ARG_VAL, &implied_dirs, 0, 0, 0 },
@@ -698,6 +701,15 @@
MAX_BATCH_NAME_LEN);
return 0;
}
+ if (offline && am_sender && !write_batch) {
+ snprintf(err_buf, sizeof err_buf,
+ "--write-batch must be used with --offline\n");
+ return 0;
+ }
+ if (offline && read_batch) {
+ /* --offline not needed (and harmful) on read */
+ offline = 0;
+ }
if (tmpdir && strlen(tmpdir) >= MAXPATHLEN - 10) {
snprintf(err_buf, sizeof err_buf,
@@ -1083,6 +1095,9 @@
args[ac++] = "--temp-dir";
args[ac++] = tmpdir;
}
+
+ if (offline)
+ args[ac++] = "--offline";
if (compare_dest && am_sender) {
/* the server only needs this option if it is not the sender,
Index: rsync-2.6.3/receiver.c
===================================================================
--- rsync-2.6.3.orig/receiver.c 2004-09-21 02:24:06.%N -0700
+++ rsync-2.6.3/receiver.c 2005-03-19 12:35:34.%N -0800
@@ -28,6 +28,7 @@
extern struct stats stats;
extern int dry_run;
extern int read_batch;
+extern int offline;
extern int batch_gen_fd;
extern int am_server;
extern int relative_paths;
Index: rsync-2.6.3/io.c
===================================================================
--- rsync-2.6.3.orig/io.c 2004-08-01 19:43:54.%N -0700
+++ rsync-2.6.3/io.c 2005-03-19 04:33:29.%N -0800
@@ -88,7 +88,7 @@
static int no_flush;
static int write_batch_monitor_in = -1;
-static int write_batch_monitor_out = -1;
+int write_batch_monitor_out = -1;
static int io_filesfrom_f_in = -1;
static int io_filesfrom_f_out = -1;
Index: rsync-2.6.3/syscall.c
===================================================================
--- rsync-2.6.3.orig/syscall.c 2004-08-02 14:56:07.%N -0700
+++ rsync-2.6.3/syscall.c 2005-03-19 12:35:36.%N -0800
@@ -27,6 +27,7 @@
#include "rsync.h"
extern int dry_run;
+extern int offline;
extern int read_only;
extern int list_only;
extern int preserve_perms;
@@ -43,7 +44,7 @@
int do_unlink(char *fname)
{
- if (dry_run) return 0;
+ if (dry_run || offline) return 0;
RETURN_ERROR_IF_RO_OR_LO;
return unlink(fname);
}
Index: rsync-2.6.3/rsync.1
===================================================================
--- rsync-2.6.3.orig/rsync.1 2005-03-18 17:19:29.%N -0800
+++ rsync-2.6.3/rsync.1 2005-03-19 23:09:32.%N -0800
@@ -424,6 +424,7 @@
\-\-bwlimit=KBPS limit I/O bandwidth, KBytes per second
\-\-write\-batch=FILE write a batch to FILE
\-\-read\-batch=FILE read a batch from FILE
+ \-\-offline make changes via batch file only
\-\-checksum\-seed=NUM set block/file checksum seed
\-4 \-\-ipv4 prefer IPv4
\-6 \-\-ipv6 prefer IPv6
@@ -1168,6 +1169,12 @@
If \fIFILE\fP is \(lq\-\(rq the batch data will be read from standard input\&.
See the \(lqBATCH MODE\(rq section for details\&.
.IP
+.IP "\fB\-\-offline\fP"
+Only record batch file -- do not send changes over the network. Use with
+\-\-write\-batch\&. Useful for keeping a large remote repository in sync
+with a local one when there is little network bandwidth available.
+See the \(lqOFFLINE MODE\(rq section for details\&.
+.IP
.IP "\fB\-4, \-\-ipv4\fP or \fB\-6, \-\-ipv6\fP"
Tells rsync to prefer IPv4/IPv6
when creating sockets\&. This only affects sockets that rsync has direct
@@ -1512,6 +1519,108 @@
.PP
The original batch mode in rsync was based on "rsync+", but the latest
version uses a new implementation\&.
+.PP
+.SH "OFFLINE MODE"
+.PP
+"Never underestimate the bandwidth of a station wagon full of tapes
+hurtling down the highway\&." \-\- Andrew Tanenbaum
+.PP
+\fBNote:\fP Offline mode should be considered experimental in this version
+of rsync\&. It depends on batch mode, which is itself experimental\&. The
+interface and behavior may not be stable yet\&. Having said
+that, the original developer of offline mode uses the code for production
+offsite backups, so it is likely to be actively maintained.
+.PP
+See the \(lqBATCH MODE\(rq section before continuing\&.
+.PP
+Offline mode (\-\-offline)
+is useful when you have enough network
+bandwidth to compare checksums, but not
+enough to carry the actual data blocks\&.
+.PP
+The drawback of ordinary batch mode is that \-\-write\-batch
+not only creates the local batch file, but also sends all changed
+bytes over the network. Adding the \-\-offline
+flag to \-\-write\-batch
+prevents the changed bytes from being sent over the network,
+while still creating the local batch file\&. You can then hand\-carry
+the batch file, on removable media, to the remote site, and use
+\-\-read\-batch to update the remote tree\&.
+.PP
+Offline mode correctly handles new and changed files
+as well as deletions (\-\-delete)\&. Compression (\-z)
+works fine, and produces a compressed batch file (just remember to
+include a \-z with \-\-read\-batch at the remote end)\&. It is safe to
+create a batch file
+multiple times without applying it to the destination. It
+is safe to lose the removable media and the batch file it contains;
+just run with \-\-offline again to regenerate it.
+It even appears
+safe to accidentally try to apply the same batch file to the destination
+multiple times \-\- as of this writing we haven't seen this cause
+problems beyond the warning messages this generates\&.
+.PP
+Example:
+.PP
+Let's say you have a terabyte of local disk, full of changing
+production data\&. You need to maintain an offsite copy of this
+data in case the
+building burns down\&. You don't want to buy a tape robot or manage
+dozens of tapes or
+removable disk drives, and you don't want to pay a lot for network
+bandwidth to a remote mirror site\&.
+.PP
+So you plug a single, cheap, 250 gigabyte external USB drive into your
+local machine. You install a terabyte of cheap disk in the remote
+machine, connect the remote machine to the Internet with a slow DSL line,
+and run this on the local machine (assume the USB drive is mounted as
+/mnt/usb):
+
+.nf
+
+ local $ rsync -e ssh \-\-offline \-\-write\-batch=/mnt/usb/foo \\
+ \-az /sourcedir/ remote:/destdir/
+
+.fi
+
+You then use the station wagon (or FedEx) to carry the USB drive to
+the remote location\&. Plug it into the remote machine, mount it as
+/mnt/usb, and say:
+
+.nf
+
+ remote $ rsync \-\-read\-batch=/mnt/usb/foo \\
+ \-az /destdir/
+
+.fi
+
+You can add \-\-delete to the above commands as well.
+.PP
+You might need
+to do this write/carry/read round\-trip a few times to get
+the remote machine fully updated the first time \-\- you'll know this
+is the case if your USB drive is filling up\&. For this offsite
+backup example it is relatively safe to allow
+the removable media to fill up; just apply the (truncated) batch file to
+the destination, then generate a new (shorter) batch file, and repeat
+the cycle until the destination is fully up to date and the USB drive
+stops filling each time\&. While this is going on, the remote
+machine will only be partially updated, just as if you cut off an
+rsync transfer over the net in mid\-stream, so keep this in mind\&.
+.PP
+Once the initial update is done, you'll likely find that the batch
+file size is quite small each trip, depending on the change rate
+of your data and how often you update\&. For most organizations a
+single 250
+gigabyte USB drive should be much more than enough to do weekly offsite
+backups of more than a terabyte of data\&.
+.PP
+For simplicity, this example ignores weekly, monthly, and
+yearly archives\&. You'll want those \-\- depending on your needs,
+you should in most cases be able to maintain them on your
+terabyte of disk, alongside the live data, with local rsync runs:
+see the \-\-link\-dest flag.
+
.PP
.SH "SYMBOLIC LINKS"
.PP
Steve
--
Stephen G. Traugott (KG6HDQ)
UNIX/Linux Infrastructure Architect, TerraLuna LLC
stevegt at TerraLuna.Org
http://www.stevegt.com -- http://Infrastructures.Org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.samba.org/archive/rsync/attachments/20050320/4bfadeb8/attachment.bin
More information about the rsync
mailing list