Rsync'ing lists of files

Stephane Paltani spaltani at head-cfa.harvard.edu
Fri Jun 7 15:28:01 EST 2002


Hi Everybody,

I'm new to this list, but I have been using rsync for quite some time.
First, congratulations to the rsync team for a very fine piece of software!

I'm wondering whether rsync could help me to perform the following task:

I have 5 million files on one side of the ocean, 100000 of which must be
copied to the other side. Both numbers grow with time, and occasionally, some
files must be removed from the "to be copied" list (i.e., they must be
deleted on the receiving side, but kept on the sending side). I currently
do this manually, but having rsync doing it would mean that the two archives
could be sync'ed much more regularly.

I tried to use a combination of --include-from=<list of files> --exclude='*',
and it seems to work. However, I have the impression that the algorithm
is far from optimal in this case: There is no usable pattern in the
file names, and I have to list all of them in the "--include-from" file.
rsync therefore makes 5000000 x 100000 comparisons approximately. The building
of the file list is therefore extremely slow (found 8000 files after 2 hours,
i.e. ~24 hours just to build the file list).
[correct me if my understanding of how rsync works is wrong].

I have the impression that the above situation might not be
so uncommon. So, is there another way that I missed in the doc to do that?
What I would be looking for is a parameter:
--file-list=<list of files> (which would override any "--in/exclude").
rsync would only consider these files, and ignore all the other ones,
and also a "--delete-not-in-list" flag which would make all the
files on the receiving side be deleted if they are not in the list.

Of course, if there is another way using current rsync, it would be great!
And sorry if I missed an obvious solution...

Cheers,
Stephan




More information about the rsync mailing list