include/exclude bug in rsync 2.6.0/2.6.1pre1

John Bowman bowman at math.ualberta.ca
Fri Apr 9 16:33:58 GMT 2004


As mentioned on the rsync home page, the --files-from=FILE option in rsync
version 2.6.0 is a useful option that allows one to "specify a list of
files to transfer, and can be much more efficient than a recursive descent
using include/exclude statements (if you know in advance what files you want to
transfer)". 

However, --files-from does not help one implement the --rsync-exclude=FILE
option previously submitted to this list (see the up-to-date patch below).
By definition this requires a recursive descent to determine the file list,
so it cannot be readily implemented with a wrapper. It requires direct
interaction with rsync's hierarchical exclude/include mechanism. 

The following patch ports the rsync-exclude patch to rsync 2.6.1pre-1
and also fixes a bug that was introduced in 2.6.0 exclude/include option
that prevents included patterns in one list from overriding previously excluded
patterns from another. This bug becomes apparent on noting that the 0
return code from check_exclude in the include case is now simply ignored in
check_exclude_file (rather than preventing lists with lower precedence to
be examined, as was the case in earlier versions):
...
	if (exclude_list && check_exclude(exclude_list, fname, is_dir))
		return 1;
	if (local_exclude_list
	 && check_exclude(local_exclude_list, fname, is_dir))
		return 1;
...

If you look at the equivalent section of code in 2.5.7, the behaviour is
different (in the case of an included pattern, local_exclude list is not
examined):

	if (exclude_list) {
		for (n=0; exclude_list[n]; n++) {
                        ent = exclude_list[n];
			if (check_one_exclude(name, ent, st)) {
                                report_exclude_result(name, ent, st);
				return !ent->include;
                        }
                }
	}

	if (local_exclude_list) {
		for (n=0; local_exclude_list[n]; n++) {
                        ent = local_exclude_list[n];
			if (check_one_exclude(name, ent, st)) {
                                report_exclude_result(name, ent, st);
				return !ent->include;
                        }
                }
	}

To further complicate things, in both versions the relative search order is
also not what one would expect (the same applies to the search order within
each list in check_exclude; an entry at the end of the list should override
a previous entry). 

It seems reasonable that if there are multiple matching patterns, the most
local and most recent matching pattern will be used, in this order:
--cvs-exclude, --exclude.

As a result of the above issues, an include pattern in local_exclude_list
(cvs-exclude) will not override a global exclude pattern in exclude_list,
contrary to what one would expect. The patch below fixes all of these problems
and adds the very flexible --rsync-exclude=FILE feature (useful for rsync-based
backups; e.g. see www.math.ualberta.ca/imaging/rlbackup).

I would very much appreciate it if this patch were incorporated into the next
release of rsync to fix the unexpected behaviour described above and also
to reduce the amount of post-release maintenance required to port 
--rsync-exclude to new releases. (Feel free to rename it as
--recursive-exclude or some such thing).

-- John Bowman
University of Alberta

This is a patch to add an --rsync-exclude=FILE option to rsync-2.6.1pre-1.
In any given directory, patterns listed in FILE can be used to recursively
exclude/include files in that directory and all of its descendants. Prefixing
a file name with "+ " will force inclusion of the file. If there are 
multiple matching patterns, the most local and most recent matching pattern
will be used, in this order: --rsync-exclude, --cvs-exclude, --exclude. 

rsync --rsync-exclude=.rsync -vaxH /here /there

will copy all files from /here to /there, excluding any files listed in
a file .rsync in the directory containing this file and all of its
subdirectories.

This feature has advantages over --cvs-exclude for backing up large file
systems since the .cvsignore files only apply to the current directory: 
unless the .cvsignore restrictions apply to the entire tree they must be
duplicated in each subdirectory. Furthemore, the --cvs-exclude option
is not intended for general system backups (for example, unless the default
list is cleared with "!", it automatically excludes *.a and *.so libraries).

diff -ru rsync-2.6.1pre-1/exclude.c rsync-2.6.1pre-1J/exclude.c
--- rsync-2.6.1pre-1/exclude.c	2004-02-23 20:23:53.000000000 +0100
+++ rsync-2.6.1pre-1J/exclude.c	2004-04-08 10:41:54.000000000 +0200
@@ -197,35 +197,39 @@
 
 static void report_exclude_result(char const *name,
                                   struct exclude_struct const *ent,
-                                  int name_is_dir)
+                                  int name_is_dir, const char *type)
 {
 	/* If a trailing slash is present to match only directories,
 	 * then it is stripped out by make_exclude.  So as a special
 	 * case we add it back in here. */
 
 	if (verbose >= 2) {
-		rprintf(FINFO, "[%s] %s %s %s because of pattern %s%s\n",
+		rprintf(FINFO, "[%s] %s %s %s because of %s pattern %s%s\n",
 			who_am_i(),
 			ent->include ? "including" : "excluding",
 			name_is_dir ? "directory" : "file",
-			name, ent->pattern,
+			name, type, ent->pattern,
 			ent->directory ? "/" : "");
 	}
 }
 
 
 /*
- * Return true if file NAME is defined to be excluded by either
- * LOCAL_EXCLUDE_LIST or the globals EXCLUDE_LIST.
+  * Return -1 (+1) if file NAME is defined to be excluded (included), according
+  * to the most recent matching pattern in list. Otherwise return 0;
  */
-int check_exclude(struct exclude_struct **list, char *name, int name_is_dir)
+int check_exclude(struct exclude_struct **list, char *name, int name_is_dir,
+		  const char *type)
 {
 	struct exclude_struct *ent;
 
-	while ((ent = *list++) != NULL) {
+	int n;
+	for (n=0; list[n]; n++) ;
+	for (n--; n >= 0; n--) {
+		ent = list[n];
 		if (check_one_exclude(name, ent, name_is_dir)) {
-			report_exclude_result(name, ent, name_is_dir);
-			return !ent->include;
+			report_exclude_result(name, ent, name_is_dir, type);
+			return (ent->include ? 1 : -1);
 		}
 	}
 
diff -ru rsync-2.6.1pre-1/flist.c rsync-2.6.1pre-1J/flist.c
--- rsync-2.6.1pre-1/flist.c	2004-02-11 03:48:58.000000000 +0100
+++ rsync-2.6.1pre-1J/flist.c	2004-04-08 10:50:46.000000000 +0200
@@ -39,6 +39,7 @@
 extern int numeric_ids;
 
 extern int cvs_exclude;
+extern const char *rsync_exclude;
 
 extern int recurse;
 extern char curr_dir[MAXPATHLEN];
@@ -66,6 +67,7 @@
 extern struct exclude_struct **exclude_list;
 extern struct exclude_struct **server_exclude_list;
 extern struct exclude_struct **local_exclude_list;
+static struct exclude_struct **recur_local_exclude_list;
 
 int io_error;
 
@@ -210,6 +212,7 @@
  */
 static int check_exclude_file(char *fname, int is_dir, int exclude_level)
 {
+  int rc;
 #if 0 /* This currently never happens, so avoid a useless compare. */
 	if (exclude_level == NO_EXCLUDES)
 		return 0;
@@ -225,16 +228,24 @@
 				return 0;
 		}
 	}
-	if (server_exclude_list
-	 && check_exclude(server_exclude_list, fname, is_dir))
-		return 1;
+	/* Precedence: use the most local and most recent matching pattern,
+	   in this order: server, --rsync-exclude, --cvs-exclude, --exclude */
+	if (server_exclude_list &&
+	    (rc=check_exclude(server_exclude_list, fname, is_dir, "server")))
+		return (rc < 0);
 	if (exclude_level != ALL_EXCLUDES)
 		return 0;
-	if (exclude_list && check_exclude(exclude_list, fname, is_dir))
-		return 1;
-	if (local_exclude_list
-	 && check_exclude(local_exclude_list, fname, is_dir))
-		return 1;
+	if (recur_local_exclude_list &&
+	    (rc=check_exclude(recur_local_exclude_list, fname, is_dir,
+			      "rsync-exclude")))
+		return (rc < 0);
+	if (local_exclude_list &&
+	    (rc=check_exclude(local_exclude_list, fname, is_dir,
+				 "cvs-exclude")))
+		return (rc < 0);
+	if (exclude_list &&
+	    (rc=check_exclude(exclude_list, fname, is_dir, "exclude")))
+		return (rc < 0);
 	return 0;
 }
 
@@ -503,7 +514,32 @@
 	io_write_phase = "unknown";
 }
 
+static struct exclude_struct **copy_exclude_list(struct exclude_struct **from) {
+	struct exclude_struct **to;
+	int i;
+	int len=0;
+	int size;
+	
+	if (!from) return NULL;
+	
+	for (; from[len]; len++) ;
+	size=sizeof(struct exclude_struct *)*(len+1);
+	to = (struct exclude_struct **) malloc(size);
+	if (!to) out_of_memory("copy_exclude_list");
+
+	size=sizeof(struct exclude_struct);
+	for (i=0; i < len; i++) {
+	        struct exclude_struct *p;
+		p=to[i]=(struct exclude_struct *) malloc(size);
+		if (!p) out_of_memory("copy_exclude_list");
+		*p=*from[i];
+		p->pattern=strdup(from[i]->pattern);
+		if (!p->pattern) out_of_memory("copy_exclude_list");
+	}
+	to[len]=NULL;
 
+	return to;
+}
 
 void receive_file_entry(struct file_struct **fptr, unsigned short flags,
     struct file_list *flist, int f)
@@ -925,8 +961,11 @@
 	if (recursive && S_ISDIR(file->mode)
 	    && !(file->flags & FLAG_MOUNT_POINT)) {
 		struct exclude_struct **last_exclude_list = local_exclude_list;
+ 		struct exclude_struct **recur_last_exclude_list =
+ 		    recur_local_exclude_list;
 		send_directory(f, flist, f_name_to(file, fbuf));
 		local_exclude_list = last_exclude_list;
+ 		recur_local_exclude_list = recur_last_exclude_list;
 		return;
 	}
 }
@@ -963,6 +1002,7 @@
 	}
 
 	local_exclude_list = NULL;
+	recur_local_exclude_list = copy_exclude_list(recur_local_exclude_list);
 
 	if (cvs_exclude) {
 		if (strlcpy(p, ".cvsignore", MAXPATHLEN - offset)
@@ -976,6 +1016,18 @@
 		}
 	}
 
+	if (rsync_exclude) {
+		if (strlen(fname) + strlen(rsync_exclude) <= MAXPATHLEN - 1) {
+			strcpy(p, rsync_exclude);
+			add_exclude_file(&recur_local_exclude_list,fname,MISSING_OK,ADD_EXCLUDE);
+		} else {
+			io_error = 1;
+			rprintf(FINFO,
+				"cannot rsync-exclude in long-named directory %s\n",
+				fname);
+		}
+	}
+
 	for (errno = 0, di = readdir(d); di; errno = 0, di = readdir(d)) {
 		char *dname = d_name(di);
 		if (dname[0] == '.' && (dname[1] == '\0'
@@ -999,6 +1051,10 @@
 	if (local_exclude_list)
 		free_exclude_list(&local_exclude_list); /* Zeros pointer too */
 
+	if (recur_local_exclude_list) {
+		free_exclude_list(&recur_local_exclude_list);
+	}
+
 	closedir(d);
 }
 
@@ -1022,6 +1078,8 @@
 	if (show_filelist_p() && f != -1)
 		start_filelist_progress("building file list");
 
+	recur_local_exclude_list = NULL;
+	
 	start_write = stats.total_written;
 
 	flist = flist_new(f == -1 ? WITHOUT_HLINK : WITH_HLINK,
diff -ru rsync-2.6.1pre-1/options.c rsync-2.6.1pre-1J/options.c
--- rsync-2.6.1pre-1/options.c	2004-02-22 09:56:43.000000000 +0100
+++ rsync-2.6.1pre-1J/options.c	2004-04-08 10:11:13.000000000 +0200
@@ -47,6 +47,7 @@
 int update_only = 0;
 int cvs_exclude = 0;
 int dry_run = 0;
+const char *rsync_exclude = NULL;
 int local_server = 0;
 int ignore_times = 0;
 int delete_mode = 0;
@@ -267,6 +268,7 @@
   rprintf(F," -P                          equivalent to --partial --progress\n");
   rprintf(F," -z, --compress              compress file data\n");
   rprintf(F," -C, --cvs-exclude           auto ignore files in the same way CVS does\n");
+  rprintf(F,"     --rsync-exclude=FILE    recursively exclude patterns locally listed in FILE\n");
   rprintf(F,"     --exclude=PATTERN       exclude files matching PATTERN\n");
   rprintf(F,"     --exclude-from=FILE     exclude patterns listed in FILE\n");
   rprintf(F,"     --include=PATTERN       don't exclude files matching PATTERN\n");
@@ -333,6 +335,7 @@
   {"dry-run",         'n', POPT_ARG_NONE,   &dry_run, 0, 0, 0 },
   {"sparse",          'S', POPT_ARG_NONE,   &sparse_files, 0, 0, 0 },
   {"cvs-exclude",     'C', POPT_ARG_NONE,   &cvs_exclude, 0, 0, 0 },
+  {"rsync-exclude",     0, POPT_ARG_STRING, &rsync_exclude, 0, 0, 0 },
   {"update",          'u', POPT_ARG_NONE,   &update_only, 0, 0, 0 },
   {"links",           'l', POPT_ARG_NONE,   &preserve_links, 0, 0, 0 },
   {"copy-links",      'L', POPT_ARG_NONE,   &copy_links, 0, 0, 0 },
diff -ru rsync-2.6.1pre-1/proto.h rsync-2.6.1pre-1J/proto.h
--- rsync-2.6.1pre-1/proto.h	2004-02-18 00:13:06.000000000 +0100
+++ rsync-2.6.1pre-1J/proto.h	2004-04-07 11:42:21.000000000 +0200
@@ -52,7 +52,8 @@
 void setup_protocol(int f_out,int f_in);
 int claim_connection(char *fname,int max_connections);
 void free_exclude_list(struct exclude_struct ***listp);
-int check_exclude(struct exclude_struct **list, char *name, int name_is_dir);
+int check_exclude(struct exclude_struct **list, char *name, int name_is_dir,
+		  const char *type);
 void add_exclude(struct exclude_struct ***listp, const char *pattern, int include);
 void add_exclude_file(struct exclude_struct ***listp, const char *fname,
 		      int fatal, int include);
diff -ru rsync-2.6.1pre-1/rsync.yo rsync-2.6.1pre-1J/rsync.yo
--- rsync-2.6.1pre-1/rsync.yo	2004-03-24 22:58:50.000000000 +0100
+++ rsync-2.6.1pre-1J/rsync.yo	2004-04-07 11:42:21.000000000 +0200
@@ -327,6 +327,7 @@
  -P                          equivalent to --partial --progress
  -z, --compress              compress file data
  -C, --cvs-exclude           auto ignore files in the same way CVS does
+     --rsync-exclude=FILE    recursively exclude patterns locally listed in FILE
      --exclude=PATTERN       exclude files matching PATTERN
      --exclude-from=FILE     exclude patterns listed in FILE
      --include=PATTERN       don't exclude files matching PATTERN
@@ -645,6 +646,13 @@
 .cvsignore file and matches one of the patterns listed therein.  See
 the bf(cvs(1)) manual for more information.
 
+dit(bf(--rsync-exclude=FILE)) In any given directory, patterns listed in
+FILE are excluded from the file lists associated with that directory
+and all of its descendants. Prefixing the file name with "+ " will force
+inclusion of the file. If there are multiple matching patterns, the most
+local and most recent matching pattern will be used, in this order:
+--rsync-exclude, --cvs-exclude, --exclude.
+
 dit(bf(--exclude=PATTERN)) This option allows you to selectively exclude
 certain files from the list of files to be transferred. This is most
 useful in combination with a recursive transfer.
diff -ru rsync-2.6.1pre-1/util.c rsync-2.6.1pre-1J/util.c
--- rsync-2.6.1pre-1/util.c	2004-02-18 00:13:10.000000000 +0100
+++ rsync-2.6.1pre-1J/util.c	2004-04-07 11:42:21.000000000 +0200
@@ -476,7 +476,7 @@
 	if (server_exclude_list) {
 		for (s = arg; (s = strchr(s, '/')) != NULL; ) {
 			*s = '\0';
-			if (check_exclude(server_exclude_list, arg, 1)) {
+			if (check_exclude(server_exclude_list, arg, 1, "server")) {
 				/* We must leave arg truncated! */
 				return 1;
 			}


More information about the rsync mailing list