Very surprising behaviour with --files-from

Robin Lee Powell rlpowell at digitalkingdom.org
Fri Dec 10 10:11:39 MST 2010


$ wc -l /tmp/list
1000 /tmp/list

$ rsync -i -aPv --ignore-existing --files-from=/tmp/list /backups/ ut00-s00010:/backups/
building file list ...
3937 files to consider

I am totally baffled.

That's not such a big deal, but the list I'm *actually* using has
twenty *million* files in it.  At a couple hundred files a second,
if it's going to check 4 times the number of files, that's a *huge*
time waste.  What's going on?

Here's what the list looks like:

$ head /tmp/list
cpool/b/c/5/bc5ea7a79a4824c6729645c66b562e6b
cpool/7/7/8/77865de94585b4581f07e54065c7b1e3
cpool/2/5/0/250f326bfa69c9da011f809a8b46cea7
cpool/3/3/8/3382672447e7f9a00ea755cee7ad5187
cpool/1/0/e/10eec0876f979ca8773f63e697be0adf
cpool/0/e/b/0ebf2a81c863702baa4eb38ec3cef655
cpool/3/6/c/36c915e781561292d9ae73e127504d0d
cpool/b/5/0/b50dcb17dac0808c4b5de1a9a3b747af
cpool/8/5/f/85fb8dc29ed1597c3fd0725ff91da279
cpool/9/0/8/90829abb5879fcbe39c2f55c4211b3c5

They are all like that, and they are all files, not directories.

I thought it could be rsync checking the directories that have those
files in them, but there are only 4300 directories, and when I
stopped the big version (OK, *that* was a mistake, but I was worried
about the behaviour) it was saying "28395900 files...", which is
rather a lot more than 20 million + 4300.

This is making a many hours difference to an already very long
process; anyone know what's going on?

-Robin


-- 
http://singinst.org/ :  Our last, best hope for a fantastic future.
Lojban (http://www.lojban.org/): The language in which "this parrot
is dead" is "ti poi spitaki cu morsi", but "this sentence is false"
is "na nei".   My personal page: http://www.digitalkingdom.org/rlp/


More information about the rsync mailing list