[jcifs] Re: SMB URL encoding/decoding

Sun Feb 24 12:51:23 EST 2002

Michael B Allen wrote:
> 
> On Sat, 23 Feb 2002 15:10:11 -0600
> "Christopher R. Hertel" <crh at ubiqx.org> wrote:
> 
> > Well, the first comment is that the SMB URL needs to meet the
> > requirements of a URL string as described in RFC 2396.  There are
> > two reasons for this.  The first is that IETF will never accept it if
> > it doesn't, and the second is
> 
> Why wouldn't they if it followed the ftp RFC James is referencing?

I didn't say it wouldn't.  I was speaking generally.

> > that most of the browsers out there have generic URL parsers.  Java
> > also has a generic URL parser, IIRC.
> 
> But it's very limited. You have to pretty much override it entirely with
> the exception of the most trivial of URLs. It only decodes the host,
> port, file (path), and ref components and it will certainly fail to do
> even that with anything that has arbitrary characters in the password
> of some auth info.
> 
>   http://java.sun.com/products/jdk/1.2/docs/api/java/net/URL.html

Which arbitrary characters?  This is where precedence comes in.  If you
parse based on the '/' character first (and escape any '/' characters in the
<server> string) then you will never get confused by '@'s elsewhere in the
URL.

I will try to get time to look at the link above.

> Perhaps we could just drop the authentication creadentials and pass them
> as QUERY_STRING parameters instead.

No, because the generic URL/URI syntax already supports the use of
credentials as part of the <server> field.  RFC2396:

3.2.2. Server-based Naming Authority

   URL schemes that involve the direct use of an IP-based protocol to a
   specified server on the Internet use a common syntax for the server
   component of the URI's scheme-specific data:

      <userinfo>@<host>:<port>

   where <userinfo> may consist of a user name and, optionally, scheme-
   specific information about how to gain authorization to access the
   server.  The parts "<userinfo>@" and ":<port>" may be omitted.

      server        = [ [ userinfo "@" ] hostport ]

   The user information, if present, is followed by a commercial at-sign
   "@".

      userinfo      = *( unreserved | escaped |
                         ";" | ":" | "&" | "=" | "+" | "$" | "," )

   Some URL schemes use the format "user:password" in the userinfo
   field. This practice is NOT RECOMMENDED, because the passing of
   authentication information in clear text (such as URI) has proven to
   be a security risk in almost every case where it has been used.

Now, as you point out, the RFC does not recommend the inclusion of the
password field.  I agree (and put notes about it in the draft) but I
seriously doubt that the commercial implementors would be willing to drop
the password string.

> In practice it would probably be
> easier to work with anyway instead of piecing together user info to build
> a full URL. Or maybe just drop the password field since it's the culprit
> and source of security concerns cited elsewhere for obvious reasons.

Some versions of DOS support the inclusion of the password in UNC strings. 
I think DEC Pathworks used to allow for this too.

The problem, by the way, is not that the password will be transmitted via
the protocol.  Even if the URL form did not allow for the password, the
application would have to prompt for a password and then exchange handshake
as normal (for that protocol).  You would still have the password involved.

The problem is posting URLs for various sites and such.  Advertising your
URL string.  That sort of thing.  And, well, if you are going to offer to
allow people to use your password then does it matter if it's in the URL
string or included nearby in the text?  People handing out URLs should not
include their password...if they have a clue.

> Let's not loose the focus of the real problem here though. SMB servers
> allow the '@' sign but the various URL RFC's do not. If we can circumvent
> this one issue it will not be necessary to URL encode/decode anything.

The '@' sign has special meaning in the <server> field.  Outside of the
server field, it does not have meaning.  More below...

> >
> > As with most parsers, there is an operator hierarchy.  I *think* that
> > the '/' character is higher precedence than th '@' but I am not sure,
> > as I haven't looked at it recently.
> 
> I don't see how assigning precidence helps.

Consider:

ftp://user:pass@foo.com/some/dir/im@home/text.txt

If the '/' has precedence over '@' then the parse tree is (roughly):

                        ["://"]
                          / \
           [scheme: "FTP"]  ['/']
                             / \
[server: "user:pass at foo.com"]  [abs_path: "some/dir/im at home/text.txt"]

The <abs_path> then parses into <path_segments> and the '@' in "im at home"
is protected.  There is no way to confuse it with the '@' in the <server>
field.

I did sort-cut some of the syntax in the above, but the RFC makes it pretty
clear.  The '/' has higher precedence than the '@', so the <server> field is
separated from the <path> before you even look for the '@'.

Am I making any sense?

Chris -)-----

-- 
Samba Team -- http://www.samba.org/     -)-----   Christopher R. Hertel
jCIFS Team -- http://jcifs.samba.org/   -)-----   ubiqx development, uninq.
ubiqx Team -- http://www.ubiqx.org/     -)-----   crh at ubiqx.mn.org