Patch: Offline transfer mode

Steve Traugott stevegt at TerraLuna.Org
Mon Mar 21 04:46:20 GMT 2005


Hi All,

Here's an rsync patch which adds an --offline flag, letting you transfer
changed blocks via removable media, while still comparing checksums via
the net.  I expect this could be very popular for the growing number of
people who want to do disk-based offsite backups, which is what I needed
it for.

It took me longer than I hoped, but still only several hours to work
this out -- it turned out to be easy enough.  Batch mode did most of
what I wanted; I just needed to keep --write-batch from also sending the
changed blocks over the net.  It was harder than it could have been;
because the entire protocol goes through write_fd(), I couldn't just
shut off all traffic there, but instead had to redirect unwanted traffic
to /dev/null higher up, in send_files().  A more graceful way, I think,
might have been to create a global (say, net_quench) which is set from
send_files() and read from write_fd(), to toggle traffic on and off
depending on context.  I also had to handle deleted files correctly, but
that was just a simple addition to do_unlink() and server_options().  

This patch is against 2.6.3, but I just now checked CVS HEAD, and I see
that, while the patch won't apply directly to HEAD, none of the changes
are great enough to make porting it forward a chore.  I'll be happy to
provide a patch against HEAD if interest is there (and it's in my own
best interests to get this into mainstream and to help maintain it
afterwards).

Here's an excerpt from the patched man page:

________________________________________________________________________

OFFLINE MODE

   "Never underestimate the bandwidth of a station wagon full of tapes
   hurtling down the highway." -- Andrew Tanenbaum

   Note: Offline mode should be considered experimental in this version
   of rsync. It depends on batch mode, which is itself experimental. The
   interface and behavior may not be stable yet. Having said that, the
   original developer of offline mode uses the code for production
   offsite backups, so it is likely to be actively maintained.

   See the ``BATCH MODE'' section before continuing.

   Offline mode (--offline) is useful when you have enough network
   bandwidth to compare checksums, but not enough to carry the actual
   data blocks.

   The drawback of ordinary batch mode is that --write-batch not only
   creates the local batch file, but also sends all changed bytes over
   the network. Adding the --offline flag to --write-batch prevents the
   changed bytes from being sent over the network, while still creating
   the local batch file. You can then hand-carry the batch file, on
   removable media, to the remote site, and use --read-batch to update
   the remote tree.

   Offline mode correctly handles new and changed files as well as
   deletions (--delete). Compression (-z) works fine, and produces a
   compressed batch file (just remember to include a -z with --read-batch
   at the remote end). It is safe to create a batch file multiple times
   without applying it to the destination. It is safe to lose the
   removable media and the batch file it contains; just run with
   --offline again to regenerate it. It even appears safe to accidentally
   try to apply the same batch file to the destination multiple times --
   as of this writing we haven't seen this cause problems beyond the
   warning messages this generates.

   Example:

   Let's say you have a terabyte of local disk, full of changing
   production data. You need to maintain an offsite copy of this data in
   case the building burns down. You don't want to buy a tape robot or
   manage dozens of tapes or removable disk drives, and you don't want to
   pay a lot for network bandwidth to a remote mirror site.

   So you plug a single, cheap, 250 gigabyte external USB drive into your
   local machine. You install a terabyte of cheap disk in the remote
   machine, connect the remote machine to the Internet with a slow DSL
   line, and run this on the local machine (assume the USB drive is
   mounted as /mnt/usb):

   local $ rsync -e ssh --offline --write-batch=/mnt/usb/foo \
          -az /sourcedir/ remote:/destdir/

   You then use the station wagon (or FedEx) to carry the USB drive to
   the remote location. Plug it into the remote machine, mount it as
   /mnt/usb, and say:

   remote $ rsync --read-batch=/mnt/usb/foo \
          -az /destdir/

   You can add --delete to the above commands as well.

   You might need to do this write/carry/read round-trip a few times to
   get the remote machine fully updated the first time -- you'll know
   this is the case if your USB drive is filling up. For this offsite
   backup example it is relatively safe to allow the removable media to
   fill up; just apply the (truncated) batch file to the destination,
   then generate a new (shorter) batch file, and repeat the cycle until
   the destination is fully up to date and the USB drive stops filling
   each time. While this is going on, the remote machine will only be
   partially updated, just as if you cut off an rsync transfer over the
   net in mid-stream, so keep this in mind.

   Once the initial update is done, you'll likely find that the batch
   file size is quite small each trip, depending on the change rate of
   your data and how often you update. For most organizations a single
   250 gigabyte USB drive should be much more than enough to do weekly
   offsite backups of more than a terabyte of data.

   For simplicity, this example ignores weekly, monthly, and yearly
   archives. You'll want those -- depending on your needs, you should in
   most cases be able to maintain them on your terabyte of disk,
   alongside the live data, with local rsync runs: see the --link-dest
   flag.


________________________________________________________________________


Index: rsync-2.6.3/sender.c
===================================================================
--- rsync-2.6.3.orig/sender.c	2004-09-19 21:17:42.%N -0700
+++ rsync-2.6.3/sender.c	2005-03-19 12:26:03.%N -0800
@@ -29,6 +29,9 @@
 extern int protocol_version;
 extern int make_backups;
 extern struct stats stats;
+extern int write_batch;
+extern int offline;
+extern int write_batch_monitor_out;
 
 
 /**
@@ -123,16 +126,27 @@
 	struct stats initial_stats;
 	int save_make_backups = make_backups;
 	int j;
+	int f_alt, old_monitor;
 
 	if (verbose > 2)
 		rprintf(FINFO, "send_files starting\n");
 
+	old_monitor = write_batch_monitor_out;
+	if (write_batch && offline) {
+		/* offline send: don't send tokens over wire */
+		f_alt = open("/dev/null", O_RDWR);
+		write_batch_monitor_out = f_alt;
+	} else {
+		f_alt = f_out;
+	}
+
 	while (1) {
 		unsigned int offset;
 
 		i = read_int(f_in);
 		if (i == -1) {
 			if (phase == 0) {
+                write_batch_monitor_out = old_monitor;
 				phase++;
 				csum_length = SUM_LENGTH;
 				write_int(f_out, -1);
@@ -173,7 +187,7 @@
 		if (dry_run) {
 			if (!am_server && verbose) /* log the transfer */
 				rprintf(FINFO, "%s\n", safe_fname(fname2));
-			write_int(f_out, i);
+			write_int(f_alt, i);
 			continue;
 		}
 
@@ -224,8 +238,8 @@
 				safe_fname(fname), (double)st.st_size);
 		}
 
-		write_int(f_out, i);
-		write_sum_head(f_out, s);
+		write_int(f_alt, i);
+		write_sum_head(f_alt, s);
 
 		if (verbose > 2) {
 			rprintf(FINFO, "calling match_sums %s\n",
@@ -237,7 +251,7 @@
 
 		set_compression(fname);
 
-		match_sums(f_out, s, mbuf, st.st_size);
+		match_sums(f_alt, s, mbuf, st.st_size);
 		log_send(file, &initial_stats);
 
 		if (mbuf) {
Index: rsync-2.6.3/options.c
===================================================================
--- rsync-2.6.3.orig/options.c	2005-03-18 17:19:30.%N -0800
+++ rsync-2.6.3/options.c	2005-03-19 12:45:22.%N -0800
@@ -113,6 +113,7 @@
 
 int write_batch = 0;
 int read_batch = 0;
+int offline = 0;
 int backup_dir_len = 0;
 int backup_suffix_len;
 unsigned int backup_dir_remainder;
@@ -304,6 +305,7 @@
   rprintf(F,"     --bwlimit=KBPS          limit I/O bandwidth, KBytes per second\n");
   rprintf(F,"     --write-batch=FILE      write a batch to FILE\n");
   rprintf(F,"     --read-batch=FILE       read a batch from FILE\n");
+  rprintf(F,"     --offline               make changes via batch file only\n");
   rprintf(F,"     --checksum-seed=NUM     set block/file checksum seed\n");
 #ifdef INET6
   rprintf(F," -4, --ipv4                  prefer IPv4\n");
@@ -401,6 +403,7 @@
   {"hard-links",      'H', POPT_ARG_NONE,   &preserve_hard_links, 0, 0, 0 },
   {"read-batch",       0,  POPT_ARG_STRING, &batch_name,  OPT_READ_BATCH, 0, 0 },
   {"write-batch",      0,  POPT_ARG_STRING, &batch_name,  OPT_WRITE_BATCH, 0, 0 },
+  {"offline",          0,  POPT_ARG_NONE,   &offline, 0, 0, 0 },
   {"files-from",       0,  POPT_ARG_STRING, &files_from, 0, 0, 0 },
   {"from0",           '0', POPT_ARG_NONE,   &eol_nulls, 0, 0, 0},
   {"no-implied-dirs",  0,  POPT_ARG_VAL,    &implied_dirs, 0, 0, 0 },
@@ -698,6 +701,15 @@
 			MAX_BATCH_NAME_LEN);
 		return 0;
 	}
+    if (offline && am_sender && !write_batch) {
+        snprintf(err_buf, sizeof err_buf,
+            "--write-batch must be used with --offline\n");
+        return 0;
+    }
+    if (offline && read_batch) {
+        /* --offline not needed (and harmful) on read */
+        offline = 0;
+    }
 
 	if (tmpdir && strlen(tmpdir) >= MAXPATHLEN - 10) {
 		snprintf(err_buf, sizeof err_buf,
@@ -1083,6 +1095,9 @@
 		args[ac++] = "--temp-dir";
 		args[ac++] = tmpdir;
 	}
+
+	if (offline)
+		args[ac++] = "--offline";
 
 	if (compare_dest && am_sender) {
 		/* the server only needs this option if it is not the sender,
Index: rsync-2.6.3/receiver.c
===================================================================
--- rsync-2.6.3.orig/receiver.c	2004-09-21 02:24:06.%N -0700
+++ rsync-2.6.3/receiver.c	2005-03-19 12:35:34.%N -0800
@@ -28,6 +28,7 @@
 extern struct stats stats;
 extern int dry_run;
 extern int read_batch;
+extern int offline;
 extern int batch_gen_fd;
 extern int am_server;
 extern int relative_paths;
Index: rsync-2.6.3/io.c
===================================================================
--- rsync-2.6.3.orig/io.c	2004-08-01 19:43:54.%N -0700
+++ rsync-2.6.3/io.c	2005-03-19 04:33:29.%N -0800
@@ -88,7 +88,7 @@
 static int no_flush;
 
 static int write_batch_monitor_in = -1;
-static int write_batch_monitor_out = -1;
+int write_batch_monitor_out = -1;
 
 static int io_filesfrom_f_in = -1;
 static int io_filesfrom_f_out = -1;
Index: rsync-2.6.3/syscall.c
===================================================================
--- rsync-2.6.3.orig/syscall.c	2004-08-02 14:56:07.%N -0700
+++ rsync-2.6.3/syscall.c	2005-03-19 12:35:36.%N -0800
@@ -27,6 +27,7 @@
 #include "rsync.h"
 
 extern int dry_run;
+extern int offline;
 extern int read_only;
 extern int list_only;
 extern int preserve_perms;
@@ -43,7 +44,7 @@
 
 int do_unlink(char *fname)
 {
-	if (dry_run) return 0;
+	if (dry_run || offline) return 0;
 	RETURN_ERROR_IF_RO_OR_LO;
 	return unlink(fname);
 }
Index: rsync-2.6.3/rsync.1
===================================================================
--- rsync-2.6.3.orig/rsync.1	2005-03-18 17:19:29.%N -0800
+++ rsync-2.6.3/rsync.1	2005-03-19 23:09:32.%N -0800
@@ -424,6 +424,7 @@
      \-\-bwlimit=KBPS          limit I/O bandwidth, KBytes per second
      \-\-write\-batch=FILE      write a batch to FILE 
      \-\-read\-batch=FILE       read a batch from FILE
+     \-\-offline               make changes via batch file only
      \-\-checksum\-seed=NUM     set block/file checksum seed
  \-4  \-\-ipv4                  prefer IPv4
  \-6  \-\-ipv6                  prefer IPv6
@@ -1168,6 +1169,12 @@
 If \fIFILE\fP is \(lq\-\(rq the batch data will be read from standard input\&.
 See the \(lqBATCH MODE\(rq section for details\&.
 .IP 
+.IP "\fB\-\-offline\fP" 
+Only record batch file -- do not send changes over the network.  Use with 
+\-\-write\-batch\&.  Useful for keeping a large remote repository in sync
+with a local one when there is little network bandwidth available.
+See the \(lqOFFLINE MODE\(rq section for details\&.
+.IP 
 .IP "\fB\-4, \-\-ipv4\fP or \fB\-6, \-\-ipv6\fP" 
 Tells rsync to prefer IPv4/IPv6
 when creating sockets\&.  This only affects sockets that rsync has direct
@@ -1512,6 +1519,108 @@
 .PP 
 The original batch mode in rsync was based on "rsync+", but the latest
 version uses a new implementation\&.
+.PP 
+.SH "OFFLINE MODE" 
+.PP 
+"Never underestimate the bandwidth of a station wagon full of tapes
+hurtling down the highway\&." \-\- Andrew Tanenbaum
+.PP 
+\fBNote:\fP Offline mode should be considered experimental in this version
+of rsync\&.  It depends on batch mode, which is itself experimental\&.  The
+interface and behavior may not be stable yet\&.  Having said
+that, the original developer of offline mode uses the code for production 
+offsite backups, so it is likely to be actively maintained.
+.PP 
+See the \(lqBATCH MODE\(rq section before continuing\&.
+.PP
+Offline mode (\-\-offline) 
+is useful when you have enough network
+bandwidth to compare checksums, but not
+enough to carry the actual data blocks\&.  
+.PP
+The drawback of ordinary batch mode is that \-\-write\-batch 
+not only creates the local batch file, but also sends all changed
+bytes over the network.  Adding the \-\-offline 
+flag to \-\-write\-batch
+prevents the changed bytes from being sent over the network,
+while still creating the local batch file\&.  You can then hand\-carry 
+the batch file, on removable media, to the remote site, and use
+\-\-read\-batch to update the remote tree\&.
+.PP
+Offline mode correctly handles new and changed files 
+as well as deletions (\-\-delete)\&.  Compression (\-z)
+works fine, and produces a compressed batch file (just remember to
+include a \-z with \-\-read\-batch at the remote end)\&.  It is safe to 
+create a batch file
+multiple times without applying it to the destination.  It
+is safe to lose the removable media and the batch file it contains;
+just run with \-\-offline again to regenerate it.  
+It even appears 
+safe to accidentally try to apply the same batch file to the destination 
+multiple times \-\- as of this writing we haven't seen this cause
+problems beyond the warning messages this generates\&.
+.PP
+Example:
+.PP
+Let's say you have a terabyte of local disk, full of changing
+production data\&.  You need to maintain an offsite copy of this 
+data in case the
+building burns down\&.  You don't want to buy a tape robot or manage 
+dozens of tapes or
+removable disk drives, and you don't want to pay a lot for network
+bandwidth to a remote mirror site\&. 
+.PP
+So you plug a single, cheap, 250 gigabyte external USB drive into your 
+local machine.  You install a terabyte of cheap disk in the remote
+machine, connect the remote machine to the Internet with a slow DSL line,
+and run this on the local machine (assume the USB drive is mounted as
+/mnt/usb):
+
+.nf 
+ 
+   local $ rsync -e ssh \-\-offline \-\-write\-batch=/mnt/usb/foo \\
+   		\-az /sourcedir/ remote:/destdir/
+
+.fi 
+
+You then use the station wagon (or FedEx) to carry the USB drive to
+the remote location\&.  Plug it into the remote machine, mount it as
+/mnt/usb, and say:
+
+.nf 
+
+   remote $ rsync \-\-read\-batch=/mnt/usb/foo \\
+   		\-az /destdir/ 
+
+.fi
+
+You can add \-\-delete to the above commands as well.
+.PP
+You might need
+to do this write/carry/read round\-trip a few times to get
+the remote machine fully updated the first time \-\- you'll know this
+is the case if your USB drive is filling up\&.  For this offsite
+backup example it is relatively safe to allow
+the removable media to fill up; just apply the (truncated) batch file to
+the destination, then generate a new (shorter) batch file, and repeat 
+the cycle until the destination is fully up to date and the USB drive
+stops filling each time\&.   While this is going on, the remote
+machine will only be partially updated, just as if you cut off an
+rsync transfer over the net in mid\-stream, so keep this in mind\&.
+.PP
+Once the initial update is done, you'll likely find that the batch
+file size is quite small each trip, depending on the change rate 
+of your data and how often you update\&.  For most organizations a
+single 250
+gigabyte USB drive should be much more than enough to do weekly offsite
+backups of more than a terabyte of data\&.
+.PP
+For simplicity, this example ignores weekly, monthly, and
+yearly archives\&.  You'll want those \-\- depending on your needs,
+you should in most cases be able to maintain them on your 
+terabyte of disk, alongside the live data, with local rsync runs: 
+see the \-\-link\-dest flag.  
+
 .PP 
 .SH "SYMBOLIC LINKS" 
 .PP 



Steve
-- 
Stephen G. Traugott  (KG6HDQ)
UNIX/Linux Infrastructure Architect, TerraLuna LLC
stevegt at TerraLuna.Org 
http://www.stevegt.com -- http://Infrastructures.Org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.samba.org/archive/rsync/attachments/20050320/4bfadeb8/attachment.bin


More information about the rsync mailing list