filelist calculation algoritm

Dave Dykstra dwd at drdykstra.us
Tue Jan 7 22:44:00 EST 2003


We've been calling this option --files-from rather than --file-list,
to be like the GNU tar option.

On Sun, Jan 05, 2003 at 09:55:50AM -0800, Wayne Davison wrote:
> On Sat, Jan 04, 2003 at 05:03:02PM -0800, jw schultz wrote:
> > that would produce destloc/srcdir/....
> > when you might want a copy of srcdir at destloc instead of
> > in destloc.
> 
> Ah yes, I _was_ missing something.  However, I still don't think we need
> to clutter rsync with two types of --file-list options.  This is already
> something that people have to deal with when using the --relative option:
> how to generate a file list that contains just the path information that
> we need to be significant.  I think that the removal of the undesired
> prefixes should happen before the list gets to rsync rather than having
> rsync do it (in your example the user would just chdir into "srcdir" and
> do the "find" relative to '.').

I agree, there should only be one option.


...
> FYI, the old rsync release that had a type of file-list functionality
> was using a specialized include/exclude list.  I believe that rsync
> still walked the entire directory tree on both sides, and applied the
> includes using a slightly different algorithm than the default (one that
> did not require parent directories to be mentioned to get down to all
> the specified files).  I think that it would be nice to avoid the
> directory-tree traversal, so I don't think we want to go this route.
> However, this is another potential implementation method (and one that
> would result in a syntax that is like what you suggested:  one that uses
> a single source dir on the command-line and doesn't require the use of
> the --relative option).

No, the include/exclude optimization completely bypassed the recursive
traversal.  It kicked in whenever the list ended with --exclude '*'
and there were otherwise only includes with no wildcards.



On Sun, Jan 05, 2003 at 01:18:06PM -0800, jw schultz wrote:
> On Sun, Jan 05, 2003 at 12:44:32PM -0800, Wayne Davison wrote:
> > On Sun, Jan 05, 2003 at 11:55:22AM -0800, jw schultz wrote:
> > > The first problem is this would flatten things unless you used
> > > relative and forced the user's CWD.  That would cause considerable
> > > confusion.
> > 
> > Really?  This is exactly how rsync works now with multiple file names on
> > the command-line, so I don't see this as being any more confusing than
> > what we already have.  The rule would be you can specify the files on
> > the command-line or on stdin (if you use '-' as the only source file).
> > Since all names are treated in the same way regardless of where they
> > were specified, everything works the same as it did before, only more
> > names are now supported per invocation.  I'm thinking that this way is
> > more flexible since it allows someone to flatten things if that's what
> > they really want to do.
> 
> And the effect of that is that users wind looping to produce
> scores of rsync sessions to transfer a single list.
> 
> > > Secondly, how would you do it when the source location is remote?
> > > Many of the users asking for this are doing pulls.
> > 
> > I mentioned a protocol change that would send the extra file names to
> > the other side after rsync starts up.  Currently the send_files()
> > routine always sends names from the sending side to the receiving side.
> > The new protocol would change that to always send names from the user
> > side to the server side when this option was specified.  The user's
> > command would look like this:
> > 
> >     rsync -avR remote:- /foo/bar
> > 
> > The file list would be read from the local (user) side, of course.  The
> > remote command being run by rsync would look like this:
> > 
> >     ssh remote rsync --server --sender -vlogDtprR . -
> > 
> > The presence of the '-' as the source would tell us to slurp names
> > instead of send them.
> > 
> > Since the file list is exchanged in total before we do any real work, I
> > think this change would actually be really easy to implement.
> 
> How many levels down should we allow - to mean "use this
> directory as cwd for list"?
> 
> 	rsync --relative remote::module/- dest/dir
> 
> If it can only be "remote:-" then everything would have to
> be relative to the user's home directory.
> "../../usr/lib/somedir/somefile" anyone?  And/or we
> allow absolute paths in the list.  So much for safe-links.
> 
> In terms of implimentation i don't think we are that far
> apart.  As it stands now we walk the source list.  For each
> file/directory we check it against the pattern list prior to
> insertion.  At the time of insertion if recursion is turned on
> each directory gets a readdir and the contents get the same
> test and insert treatment.
> 
> You are replacing the source list with stdin.  I'm basically
> saying that the list from stdin or from a file would be used
> instead of readdir.
> 
> Both cases require a protocol bump to support sending the
> list for a pull.
> 
> The discussion seems to be fruitful but i'd like to see more
> participants with other perspectives before i'd bookmark it
> as a TODO.


I think the use of "-" in place of the source directory is likely to
be harder to implement and explain than to have the list of files in
--files-from.  It also allows the flexibility of having stdin be used
for something else if someone wants it to.  There's already precedent of
having a source directory combined with --files-from in GNU tar.

- Dave



More information about the rsync mailing list