[PATCH] allow to change the block size used to handle sparse files

Andrea Righi righi.andrea at gmail.com
Sun Mar 23 10:51:08 GMT 2008


In some filesystems, typically optimized for large I/O throughputs (like
IBM GPFS, IBM SAN FS, or distributed filesystems in general) a lot of
lseek() operations can strongly impact on performances. In this cases it
can be helpful to enlarge the block size used to handle sparse files
directly from a command line parameter.

For example, using a sparse write size of 32KB, I've been able to
increase the transfer rate of an order of magnitude copying the output
files of scientific applications from GPFS to GPFS or GPFS to SAN FS.

-Andrea

---
Allow to change the block size used to handle sparse files.

 fileio.c  |    3 ++-
 options.c |    9 +++++++++
 rsync.yo  |   10 ++++++++++
 3 files changed, 21 insertions(+), 1 deletions(-)

diff --git a/fileio.c b/fileio.c
index f086494..39cae92 100644
--- a/fileio.c
+++ b/fileio.c
@@ -26,6 +26,7 @@
 #endif
 
 extern int sparse_files;
+extern long sparse_files_block_size;
 
 static char last_byte;
 static size_t sparse_seek = 0;
@@ -115,7 +116,7 @@ int write_file(int f,char *buf,size_t len)
 	while (len > 0) {
 		int r1;
 		if (sparse_files > 0) {
-			int len1 = MIN(len, SPARSE_WRITE_SIZE);
+			int len1 = MIN(len, (size_t)sparse_files_block_size);
 			r1 = write_sparse(f, buf, len1);
 		} else {
 			if (!wf_writeBuf) {
diff --git a/options.c b/options.c
index f2d23f6..aaffdc7 100644
--- a/options.c
+++ b/options.c
@@ -73,6 +73,7 @@ int remove_source_files = 0;
 int one_file_system = 0;
 int protocol_version = PROTOCOL_VERSION;
 int sparse_files = 0;
+long sparse_files_block_size = SPARSE_WRITE_SIZE;
 int do_compression = 0;
 int def_compress_level = Z_DEFAULT_COMPRESSION;
 int am_root = 0; /* 0 = normal, 1 = root, 2 = --super, -1 = --fake-super */
@@ -358,6 +359,7 @@ void usage(enum logcode F)
   rprintf(F,"     --fake-super            store/recover privileged attrs using xattrs\n");
 #endif
   rprintf(F," -S, --sparse                handle sparse files efficiently\n");
+  rprintf(F," -U, --sparse-block=SIZE     force a fixed block size to handle sparse files\n");
   rprintf(F," -n, --dry-run               perform a trial run with no changes made\n");
   rprintf(F," -W, --whole-file            copy files whole (without delta-xfer algorithm)\n");
   rprintf(F," -x, --one-file-system       don't cross filesystem boundaries\n");
@@ -540,6 +542,7 @@ static struct poptOption long_options[] = {
   {"sparse",          'S', POPT_ARG_VAL,    &sparse_files, 1, 0, 0 },
   {"no-sparse",        0,  POPT_ARG_VAL,    &sparse_files, 0, 0, 0 },
   {"no-S",             0,  POPT_ARG_VAL,    &sparse_files, 0, 0, 0 },
+  {"sparse-block",    'U', POPT_ARG_LONG,   &sparse_files_block_size, 0, 0, 0 },
   {"inplace",          0,  POPT_ARG_VAL,    &inplace, 1, 0, 0 },
   {"no-inplace",       0,  POPT_ARG_VAL,    &inplace, 0, 0, 0 },
   {"append",           0,  POPT_ARG_NONE,   0, OPT_APPEND, 0, 0 },
@@ -1875,6 +1878,12 @@ void server_options(char **args, int *argc_p)
 		args[ac++] = arg;
 	}
 
+	if (sparse_files_block_size) {
+		if (asprintf(&arg, "-U%lu", sparse_files_block_size) < 0)
+			goto oom;
+		args[ac++] = arg;
+	}
+
 	if (io_timeout) {
 		if (asprintf(&arg, "--timeout=%d", io_timeout) < 0)
 			goto oom;
diff --git a/rsync.yo b/rsync.yo
index 6bbf400..0d9c067 100644
--- a/rsync.yo
+++ b/rsync.yo
@@ -352,6 +352,7 @@ to the detailed description below for a complete description.  verb(
      --super                 receiver attempts super-user activities
      --fake-super            store/recover privileged attrs using xattrs
  -S, --sparse                handle sparse files efficiently
+ -U, --sparse-block=SIZE     force a fixed block size to handle sparse files
  -n, --dry-run               perform a trial run with no changes made
  -W, --whole-file            copy files whole (w/o delta-xfer algorithm)
  -x, --one-file-system       don't cross filesystem boundaries
@@ -1039,6 +1040,15 @@ NOTE: Don't use this option when the destination is a Solaris "tmpfs"
 filesystem. It doesn't seem to handle seeks over null regions
 correctly and ends up corrupting the files.
 
+dit(bf(-U, --sparse-block=SIZE)) Change the block size used to handle sparse
+files. This options is used only in combination with bf(-S, --sparse). The
+default block size used by rsync to detect a file hole is 1024 bytes; when the
+receiver writes data to the destination file and option bf(-S, --sparse) is
+used, rsync checks every 1024-bytes chunk to detect if they are actually filled
+with data or not. With certain filesystems, optimized to receive data streams
+for example, enlarging this block size can strongly increase performance. The
+option bf(-U, --sparse-block=SIZE) can be used to tune this block size.
+
 dit(bf(-n, --dry-run)) This makes rsync perform a trial run that doesn't
 make any changes (and produces mostly the same output as a real run).  It
 is most commonly used in combination with the bf(-v, --verbose) and/or


More information about the rsync mailing list