why does rsync translate user at host into '$RSYNC_RSH -l user host'?

Wed Oct 19 21:39:54 MDT 2011

On Thu, 20 Oct 2011, Cameron Simpson wrote:

> On 19Oct2011 12:02, Benjamin R. Haskell <rsync at benizi.com> wrote:
> | On Wed, 19 Oct 2011, Kevin Korb wrote:
> | >Because it is an even bigger joy to be able to type 'ssh newhost'
> | >and have it just work even though you can't talk to newhost.  You
> | >can do that by properly configuring ssh in ~/.ssh/config with
> | >something like this:
> | >
> | >Host accessiblehost
> | > User cameron
> | >
> | >Host newhost
> | > ProxyCommand ssh accessiblehost -W %h:%p
> | > User root
> |
> | +1.  Seems easier than a DIY script.
>
> You guys are kidding, surely? You hand edit your ssh configs for every
> ah doc chain of hosts you might want?

No.  But, why do you have so many ad hoc chains of hosts in the first 
place?  I don't fully understand the use case, I s'pose.

You can replace: 'Host newhost' with 'Host *.inaccessible.subnet'. 
That's what the %h:%p affords (%h = host, %p = port).

> So when I go:
>
>  for h in a b c d e; do sshto transithost!$h do_foo; done
>
> you find it easier to make a bunch of ssh config clauses first, each 
> with cool distinctive names? I salute your typing skills.

Host newhost
in the example ssh/config becomes:
Host a b c d e

and

for h in a b c d e; do sshto transithost!$h do_foo; done
becomes:
for h in a b c d e; do ssh $h do_foo; done

> (Besides, I wrote my first sshto tool before the ProxyCommand 
> directive existed.)

That, on the other hand, is a valid rationale.

> | But to answer the original question:
> |
> | >On 10/19/11 03:40, Cameron Simpson wrote:
> | >>Why does rsync believe it knows more about the use of the token 
> | >>to the left of the colon than the program which will be used as 
> | >>the remote connection?
> | >>
> | >>[...]
> | >>
> | >>what it invokes is:
> | >>
> | >>  sshto -l cameron at accessiblehost!root newhost rsync .....
> | >>
> | >>Since sshto is my own tool I can probably have it cope with this 
> | >>mangling of my target string into "-l foo bah", and undo it.
> | >>
> | >>But WHY does rsync believe this is desirable, or even necessary?
> |
> | rsync has to parse the URL you're passing.  The fact that it then 
> | takes that and runs something like `$RSYNC_RSH -l user host` is 
> | because rsync expects it's handing the connection duties off to 
> | something that uses rsh-like calling conventions.  So, it's 
> | "desirable" because rsh-like tools traditionally expect it.
>
> But rsh-like tools _all_ accept user at host already. 
> They don't "expect" the "-l" form - they cope with it.
>
> This argument does not make it desirable unless rsh or ssh don't cope
> with user at host. Which they do.

Rsh doesn't.

$ rsh root at localhost
rcmd: getaddrinfo: No address associated with hostname

(Tested under Gentoo Linux and FreeBSD.)

> | If rsync didn't parse the URL and split it out, each tool would have 
> | to do its own {user}@{host} parsing.  So, it's not fully 
> | "necessary". (Most of the tools probably do have that kind of 
> | parsing.)  It just makes things easier for tools that use the '-l' 
> | convention.
>
> The point here is the rsync is presuming to know about the tool. The 
> whole point of the -e and $RSYNC_RSH stuff is to use arbitrary tools.

At the time the feature was introduced (it's in the first revision that 
made it into git, so, pre-1996), the point wasn't to insert arbitrary 
commands, it was to allow the use of alternatives to rsh.  Probably 
remsh, a tool that provided an alternative to rsh.  Later, ssh, another 
tool that could replace rsh, and which understood `-l $user`.

> Having rsync pull out the user doesn't _help_ rsh or ssh, both of 
> which has always (AFAIR) accepted user at host and does raise the 
> implementation bar for other tools for no actual benefit.
>
> Has anyone a use case that _breaks_ if rsync doesn't pull out what it 
> imagines is the "user@" part?

Using "real" rsh, apparently.

-- 
Best,
Ben