duplicated file removal: call for comment
jw schultz
jw at pegasys.ws
Thu Feb 13 08:42:28 EST 2003
On Wed, Feb 12, 2003 at 01:00:35PM +0100, Thomas Osterried wrote:
> I fully agree with jw schultz's first and second issue, to his --delete
> assumption and to the the point that lexical order does not matter.
>
> > This unfortunately does mean that a means of preserving
> > initial sequence must be incorporated or the qsort approach
> > to finding duplicates would have to be forgotten.
> > This could be as simple as running qsort on an array of
> > indices to flist->files instead of flist->files itself.
>
> qsort is a problem and it's clear why.
> i just wonder, why in real-life it always woked fine for me ;)
>
> > > Another concern I have about this fix in 2.5.6 is that there is risk
> > > the change is not backward compatible with earlier protocol versions.
> > > The file list is sent (unsorted and uncleaned) from the sender to the
> > > receiver, and each side then sorts and cleans the list.
> > > Remember that the files are referred to as an
> > > integer index into the sorted file list, and the receiver skips
> > > NULL (duplicate) files.
> > > I suspect (but haven't checked) that if a 2.5.5 receiver is talking to
> > > a 2.5.6 sender then 2.5.5 will send the index for the 3rd file, which
> > > will be null_file on 2.5.6.
> > FYI, I just ran a test, and indeed, this causes a seg fault. I'm
> > working up a patch.
>
> I initially thought the file deletion is done by the sending site,
> before sending the integer index list to the receiving rsync.. The
> problem is more complex than it initially seemed.
>
> rsync2.5.5 to 2.5.6 could generally request either the last _or_ last-1
> file, depending on the files in the tree [last-1 for e.g., when given 4
> source trees with a missing correspondending file in tree2].
>
> > This also means that if we change which file gets removed (switching
> > back to leaving the last name) we'd need to bump the protocol number
> > and add some compatibility code for older versions. Ick.
>
> Since there are some problems with this issue, an increased pid could be
> a good idea.
OK. I think i might have an approach. My idea is to use a
slightly different routine for comparisons by qsort(). This
would never treat two files as equal. As a fall-back use
their position in the array as a basis for comparison. This
will preserve the command-line order of otherwise equal
files.
Because this is a small adjustment in the qsort the
duplicate selection will be non-deterministic on a qsort
version mismatch.
I've attached a patch i _think_ will do this. It includes
code to do fallback for deleted files based on the
assumption that modtime will not be NULL if the file exists.
I have confirmed it compiles and make check doesn't fail.
PS. inlining file_compare() in qsort_file_compare would be a
good idea.
--
________________________________________________________________
J.W. Schultz Pegasystems Technologies
email address: jw at pegasys.ws
Remember Cernan and Schmitt
-------------- next part --------------
Index: flist.c
===================================================================
RCS file: /cvsroot/rsync/flist.c,v
retrieving revision 1.130
diff -u -b -r1.130 flist.c
--- flist.c 12 Feb 2003 09:15:23 -0000 1.130
+++ flist.c 12 Feb 2003 21:40:50 -0000
@@ -1148,6 +1148,13 @@
return u_strcmp(f_name(*f1), f_name(*f2));
}
+int qsort_file_compare(struct file_struct **f1,struct file_struct **f2)
+{
+ int rv;
+ rv = file_compare(f1, f2);
+ if (!rv) rv = (*f1 < *f2) ? -1 : 1;
+ return rv;
+}
int flist_find(struct file_list *flist, struct file_struct *f)
{
@@ -1249,17 +1256,19 @@
static void clean_flist(struct file_list *flist, int strip_root, int no_dups)
{
int i;
+ int remove_i, prev_i;
char *name, *prev_name = NULL;
if (!flist || flist->count == 0)
return;
qsort(flist->files, flist->count,
- sizeof(flist->files[0]), (int (*)()) file_compare);
+ sizeof(flist->files[0]), (int (*)()) qsort_file_compare);
for (i = no_dups? 0 : flist->count; i < flist->count; i++) {
if (flist->files[i]->basename) {
prev_name = f_name(flist->files[i]);
+ prev_i = i;
break;
}
}
@@ -1273,14 +1282,23 @@
"removing duplicate name %s from file list %d\n",
name, i);
}
+ if (flist->files[i]->modtime)
+ {
+ remove_i = prev_i;
+ prev_i = i;
+ } else {
+ remove_i = i;
+ }
/* it's not great that the flist knows the semantics of
* the file memory usage, but i'd rather not add a flag
* byte to that struct.
* XXX can i use a bit in the flags field? */
if (flist->string_area)
- flist->files[i][0] = null_file;
+ flist->files[remove_i][0] = null_file;
else
- free_file(flist->files[i]);
+ free_file(flist->files[remove_i]);
+ } else {
+ prev_i = i;
}
prev_name = name;
}
More information about the rsync
mailing list