specifying a list of files to transfer

Andrew J. Schorr aschorr at telemetry-investments.com
Fri Jan 17 17:15:00 EST 2003


On Thu, Jan 16, 2003 at 07:06:05PM -0800, jw schultz wrote:
> I know i'm not talking about when -R is used.  I am talking
> about creating implied intermediate directories without -R.
> I'm talking about being able to take the output of
> find -name '*.jpg' and have it create (if necessary) any
> intermediate directories while maintaining the equivalency
> of src and dest.  If that means also behaving as though
> those directories were already in the list that would be OK
> as long as -r weren't specified.
> 
> 	find . -name '*.jpg' | rsync -a --files-from=- .  remote:
> should when it hits
> 	./deltapics/031CGMUa.jpg
> 	./deltapics/031CGNga.jpg
> 	./deltapics/031CGOHa.jpg
> 	./deltapics/031CGPOa.jpg
> 	./deltapics/031CGPba.jpg
> create the deltapics directory if it doesn't exist.  The
> permissions and ownership should be derived from the source.
> so effectively it should be as though
> 	./deltapics
> where in the file list.  It needn't be updated if it
> does exist but if easier to implement it that way i wouldn't
> object.  In such a case even if -r is
> allowed and specified the implied directory should not defeat
> the the file list by transferring any files not in the list.
> 
> No errors, no need to do a run to find the missing
> directories and add them and no need to add a filter to the
> stream adding entries for directories that are missing.

There are performance issues associated with sending all the
parent directories automatically.  Consider the situation where
running "find test -name "*.jpg" -print" gives the following results
(and yes, this does happen, at least for me on solaris 8 where the
output of find seems to depend on the order in which the directory
entries were created):

   test/foo.jpg
   test/bar.jpg
   test/sub/foo.jpg
   test/zeke.jpg

If I run rsync in such a way that parent directories are sent automatically,
it will send the following files (based on -vvv output):

   make_file(4,test)
   make_file(4,test/foo.jpg)
   make_file(4,test/bar.jpg)
   make_file(4,test)
   make_file(4,test/sub)
   make_file(4,test/sub/foo.jpg)
   make_file(4,test)
   make_file(4,test/zeke.jpg)

Note that the "test" directory is sent 3 times in this case.  This is
because the code that checks whether to send the directory just compares
to the last one sent in an attempt to eliminate duplicates.  But this
is not a reliable way of preventing duplicates, as the above example
demonstrates.  So there is a danger of sending lots of duplicate
directory entries when the automatic directory transmission feature
is enabled.

This could probably be fixed by keeping a hash table of all the directory
entries that have already been transmitted instead of just comparing
against the last one sent.

In any case, I think it's important to be able to turn off the
automatic directory sending feature so that situations that don't
require this can avoid the performance hit.

-Andy



More information about the rsync mailing list