filelist calculation algoritm
Dave Dykstra
dwd at drdykstra.us
Tue Jan 7 22:44:00 EST 2003
We've been calling this option --files-from rather than --file-list,
to be like the GNU tar option.
On Sun, Jan 05, 2003 at 09:55:50AM -0800, Wayne Davison wrote:
> On Sat, Jan 04, 2003 at 05:03:02PM -0800, jw schultz wrote:
> > that would produce destloc/srcdir/....
> > when you might want a copy of srcdir at destloc instead of
> > in destloc.
>
> Ah yes, I _was_ missing something. However, I still don't think we need
> to clutter rsync with two types of --file-list options. This is already
> something that people have to deal with when using the --relative option:
> how to generate a file list that contains just the path information that
> we need to be significant. I think that the removal of the undesired
> prefixes should happen before the list gets to rsync rather than having
> rsync do it (in your example the user would just chdir into "srcdir" and
> do the "find" relative to '.').
I agree, there should only be one option.
...
> FYI, the old rsync release that had a type of file-list functionality
> was using a specialized include/exclude list. I believe that rsync
> still walked the entire directory tree on both sides, and applied the
> includes using a slightly different algorithm than the default (one that
> did not require parent directories to be mentioned to get down to all
> the specified files). I think that it would be nice to avoid the
> directory-tree traversal, so I don't think we want to go this route.
> However, this is another potential implementation method (and one that
> would result in a syntax that is like what you suggested: one that uses
> a single source dir on the command-line and doesn't require the use of
> the --relative option).
No, the include/exclude optimization completely bypassed the recursive
traversal. It kicked in whenever the list ended with --exclude '*'
and there were otherwise only includes with no wildcards.
On Sun, Jan 05, 2003 at 01:18:06PM -0800, jw schultz wrote:
> On Sun, Jan 05, 2003 at 12:44:32PM -0800, Wayne Davison wrote:
> > On Sun, Jan 05, 2003 at 11:55:22AM -0800, jw schultz wrote:
> > > The first problem is this would flatten things unless you used
> > > relative and forced the user's CWD. That would cause considerable
> > > confusion.
> >
> > Really? This is exactly how rsync works now with multiple file names on
> > the command-line, so I don't see this as being any more confusing than
> > what we already have. The rule would be you can specify the files on
> > the command-line or on stdin (if you use '-' as the only source file).
> > Since all names are treated in the same way regardless of where they
> > were specified, everything works the same as it did before, only more
> > names are now supported per invocation. I'm thinking that this way is
> > more flexible since it allows someone to flatten things if that's what
> > they really want to do.
>
> And the effect of that is that users wind looping to produce
> scores of rsync sessions to transfer a single list.
>
> > > Secondly, how would you do it when the source location is remote?
> > > Many of the users asking for this are doing pulls.
> >
> > I mentioned a protocol change that would send the extra file names to
> > the other side after rsync starts up. Currently the send_files()
> > routine always sends names from the sending side to the receiving side.
> > The new protocol would change that to always send names from the user
> > side to the server side when this option was specified. The user's
> > command would look like this:
> >
> > rsync -avR remote:- /foo/bar
> >
> > The file list would be read from the local (user) side, of course. The
> > remote command being run by rsync would look like this:
> >
> > ssh remote rsync --server --sender -vlogDtprR . -
> >
> > The presence of the '-' as the source would tell us to slurp names
> > instead of send them.
> >
> > Since the file list is exchanged in total before we do any real work, I
> > think this change would actually be really easy to implement.
>
> How many levels down should we allow - to mean "use this
> directory as cwd for list"?
>
> rsync --relative remote::module/- dest/dir
>
> If it can only be "remote:-" then everything would have to
> be relative to the user's home directory.
> "../../usr/lib/somedir/somefile" anyone? And/or we
> allow absolute paths in the list. So much for safe-links.
>
> In terms of implimentation i don't think we are that far
> apart. As it stands now we walk the source list. For each
> file/directory we check it against the pattern list prior to
> insertion. At the time of insertion if recursion is turned on
> each directory gets a readdir and the contents get the same
> test and insert treatment.
>
> You are replacing the source list with stdin. I'm basically
> saying that the list from stdin or from a file would be used
> instead of readdir.
>
> Both cases require a protocol bump to support sending the
> list for a pull.
>
> The discussion seems to be fruitful but i'd like to see more
> participants with other perspectives before i'd bookmark it
> as a TODO.
I think the use of "-" in place of the source directory is likely to
be harder to implement and explain than to have the list of files in
--files-from. It also allows the flexibility of having stdin be used
for something else if someone wants it to. There's already precedent of
having a source directory combined with --files-from in GNU tar.
- Dave
More information about the rsync
mailing list