duplicated file removal: call for comment

jw schultz jw at pegasys.ws
Thu Feb 13 08:42:28 EST 2003


On Wed, Feb 12, 2003 at 01:00:35PM +0100, Thomas Osterried wrote:
> I fully agree with jw schultz's first and second issue, to his --delete
> assumption and to the the point that lexical order does not matter.
> 
> > This unfortunately does mean that a means of preserving
> > initial sequence must be incorporated or the qsort approach
> > to finding duplicates would have to be forgotten.
> > This could be as simple as running qsort on an array of
> > indices to flist->files instead of flist->files itself.
> 
> qsort is a problem and it's clear why.
> i just wonder, why in real-life it always woked fine for me ;)
> 
> > > Another concern I have about this fix in 2.5.6 is that there is risk
> > > the change is not backward compatible with earlier protocol versions.
> > > The file list is sent (unsorted and uncleaned) from the sender to the
> > > receiver, and each side then sorts and cleans the list.
> > > Remember that the files are referred to as an
> > > integer index into the sorted file list, and the receiver skips
> > > NULL (duplicate) files.
> > > I suspect (but haven't checked) that if a 2.5.5 receiver is talking to
> > > a 2.5.6 sender then 2.5.5 will send the index for the 3rd file, which
> > > will be null_file on 2.5.6.
> > FYI, I just ran a test, and indeed, this causes a seg fault.  I'm
> > working up a patch.
> 
> I initially thought the file deletion is done by the sending site,
> before sending the integer index list to the receiving rsync.. The
> problem is more complex than it initially seemed.
> 
> rsync2.5.5 to 2.5.6 could generally request either the last _or_ last-1
> file, depending on the files in the tree [last-1 for e.g., when given 4
> source trees with a missing correspondending file in tree2].
> 
> > This also means that if we change which file gets removed (switching
> > back to leaving the last name) we'd need to bump the protocol number
> > and add some compatibility code for older versions.  Ick.
> 
> Since there are some problems with this issue, an increased pid could be
> a good idea.

OK.  I think i might have an approach.  My idea is to use a
slightly different routine for comparisons by qsort().  This
would never treat two files as equal.  As a fall-back use
their position in the array as a basis for comparison.  This
will preserve the command-line order of otherwise equal
files.

Because this is a small adjustment in the qsort the
duplicate selection will be non-deterministic on a qsort
version mismatch.

I've attached a patch i _think_ will do this.  It includes
code to do fallback for deleted files based on the
assumption that modtime will not be NULL if the file exists.
I have confirmed it compiles and make check doesn't fail.

PS. inlining file_compare() in qsort_file_compare would be a
good idea.

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw at pegasys.ws

		Remember Cernan and Schmitt
-------------- next part --------------
Index: flist.c
===================================================================
RCS file: /cvsroot/rsync/flist.c,v
retrieving revision 1.130
diff -u -b -r1.130 flist.c
--- flist.c	12 Feb 2003 09:15:23 -0000	1.130
+++ flist.c	12 Feb 2003 21:40:50 -0000
@@ -1148,6 +1148,13 @@
 	return u_strcmp(f_name(*f1), f_name(*f2));
 }
 
+int qsort_file_compare(struct file_struct **f1,struct file_struct **f2)
+{
+	int rv;
+	rv = file_compare(f1, f2);
+	if (!rv) rv = (*f1 < *f2) ? -1 : 1;
+	return rv;
+}
 
 int flist_find(struct file_list *flist, struct file_struct *f)
 {
@@ -1249,17 +1256,19 @@
 static void clean_flist(struct file_list *flist, int strip_root, int no_dups)
 {
 	int i;
+	int remove_i, prev_i;
 	char *name, *prev_name = NULL;
 
 	if (!flist || flist->count == 0)
 		return;
 
 	qsort(flist->files, flist->count,
-	      sizeof(flist->files[0]), (int (*)()) file_compare);
+	      sizeof(flist->files[0]), (int (*)()) qsort_file_compare);
 
 	for (i = no_dups? 0 : flist->count; i < flist->count; i++) {
 		if (flist->files[i]->basename) {
 			prev_name = f_name(flist->files[i]);
+			prev_i = i;
 			break;
 		}
 	}
@@ -1273,14 +1282,23 @@
 					"removing duplicate name %s from file list %d\n",
 					name, i);
 			}
+			if (flist->files[i]->modtime)
+			{
+				remove_i = prev_i;
+				prev_i = i;
+			} else {
+				remove_i = i;
+			}
 			/* it's not great that the flist knows the semantics of
 			 * the file memory usage, but i'd rather not add a flag
 			 * byte to that struct.
 			 * XXX can i use a bit in the flags field? */
 			if (flist->string_area)
-				flist->files[i][0] = null_file;
+				flist->files[remove_i][0] = null_file;
 			else
-				free_file(flist->files[i]);
+				free_file(flist->files[remove_i]);
+		} else {
+			prev_i = i;
 		}
 		prev_name = name;
 	}


More information about the rsync mailing list