Very surprising behaviour with --files-from

Robin Lee Powell rlpowell at digitalkingdom.org
Fri Dec 10 15:03:40 MST 2010


On Fri, Dec 10, 2010 at 01:40:04PM -0800, Steven Levine wrote:
> In <20101210171139.GD27025 at digitalkingdom.org>, on 12/10/10
>    at 09:11 AM, Robin Lee Powell <rlpowell at digitalkingdom.org> said:
> 
> Hi,
> 
> >$ rsync -i -aPv --ignore-existing --files-from=/tmp/list /backups/
> >ut00-s00010:/backups/ building file list ...
> >3937 files to consider
> 
> >That's not such a big deal, but the list I'm *actually* using has twenty
> >*million* files in it.  At a couple hundred files a second, if it's going
> >to check 4 times the number of files, that's a *huge* time waste.  What's
> >going on?
> 
> I'm not quite sure what's going on either.  What I recommend is cut your
> list down to 1 file and use
> 
>   rsync -ii -aPv --ignore-existing --files-from=/tmp/list \
>         /backups/ ut00-s00010:/backups
> 
> If this does not answer the question add one more -v.

...

I figured it out, thanks to that starting point.  Thank you so much.

I can't *quite* decide whether to be horrified at rsync or not.

Given this:

cpool/b/c/5/bc50007d8ab0221cb2b2b61e0754224c
cpool/b/c/5/bc500094bb43d0f4235363f65658d231

we have:

$ rsync -ii -aPvvv --ignore-existing --files-from=/tmp/list2 /backups/ ut00-s00010:/backups/
opening connection using: ssh ut00-s00010 rsync --server -vvvlogDtpRe.Ls "--log-format=%i%I" --partial --ignore-existing . /backups/
building file list ...
[sender] make_file(cpool,*,2)
[sender] make_file(cpool/b,*,2)
[sender] make_file(cpool/b/c,*,2)
[sender] make_file(cpool/b/c/5,*,2)
[sender] make_file(cpool/b/c/5/bc50007d8ab0221cb2b2b61e0754224c,*,0)
[sender] make_file(cpool/b/c/5/bc500094bb43d0f4235363f65658d231,*,0)
6 files to consider

which is fine.  But given this:

cpool/b/c/5/bc50007d8ab0221cb2b2b61e0754224c
cpool/7/7/8/77865de94585b4581f07e54065c7b1e3
cpool/b/c/5/bc500094bb43d0f4235363f65658d231

we have:

$ rsync -ii -aPvvv --ignore-existing --files-from=/tmp/list2 /backups/ ut00-s00010:/backups/
opening connection using: ssh ut00-s00010 rsync --server -vvvlogDtpRe.Ls "--log-format=%i%I" --partial --ignore-existing . /backups/
building file list ...
[sender] make_file(cpool,*,2)
[sender] make_file(cpool/b,*,2)
[sender] make_file(cpool/b/c,*,2)
[sender] make_file(cpool/b/c/5,*,2)
[sender] make_file(cpool/b/c/5/bc50007d8ab0221cb2b2b61e0754224c,*,0)
[sender] make_file(cpool/7,*,2)
[sender] make_file(cpool/7/7,*,2)
[sender] make_file(cpool/7/7/8,*,2)
[sender] make_file(cpool/7/7/8/77865de94585b4581f07e54065c7b1e3,*,0)
[sender] make_file(cpool/b,*,2)
[sender] make_file(cpool/b/c,*,2)
[sender] make_file(cpool/b/c/5,*,2)
[sender] make_file(cpool/b/c/5/bc500094bb43d0f4235363f65658d231,*,0)
13 files to consider

On the other hand, given this:

cpool/b/c/5/bc50007d8ab0221cb2b2b61e0754224c
cpool/b/c/5/bc500094bb43d0f4235363f65658d231
cpool/7/7/8/77865de94585b4581f07e54065c7b1e3

, that is, same files, changed order, we have:

ut00-s00005 ~ # rsync -ii -aPvvv --ignore-existing --files-from=/tmp/list2 /backups/ ut00-s00010:/backups/
opening connection using: ssh ut00-s00010 rsync --server -vvvlogDtpRe.Ls "--log-format=%i%I" --partial --ignore-existing . /backups/
building file list ...
[sender] make_file(cpool,*,2)
[sender] make_file(cpool/b,*,2)
[sender] make_file(cpool/b/c,*,2)
[sender] make_file(cpool/b/c/5,*,2)
[sender] make_file(cpool/b/c/5/bc50007d8ab0221cb2b2b61e0754224c,*,0)
[sender] make_file(cpool/b/c/5/bc500094bb43d0f4235363f65658d231,*,0)
[sender] make_file(cpool/7,*,2)
[sender] make_file(cpool/7/7,*,2)
[sender] make_file(cpool/7/7/8,*,2)
[sender] make_file(cpool/7/7/8/77865de94585b4581f07e54065c7b1e3,*,0)
10 files to consider


In the middle case, it checks all the /b/ stuff a second time
because of *the order of the file*.

Which is fine and appropriate in terms of RAM usage, I guess, but
very very surprising.  I call this a feature, but a documentation
fail.  At least, I can't find anything in the docs that mentions
"and you'd better sort the file or you won't like the results at
all".

-Robin

-- 
http://singinst.org/ :  Our last, best hope for a fantastic future.
Lojban (http://www.lojban.org/): The language in which "this parrot
is dead" is "ti poi spitaki cu morsi", but "this sentence is false"
is "na nei".   My personal page: http://www.digitalkingdom.org/rlp/


More information about the rsync mailing list