[jcifs] Re: SMB URL encoding/decoding

Sun Feb 24 13:28:55 EST 2002

Christopher R. Hertel wrote:

>Michael B Allen wrote:
>
>>On Sat, 23 Feb 2002 15:10:11 -0600
>>"Christopher R. Hertel" <crh at ubiqx.org> wrote:
>>
>>>Well, the first comment is that the SMB URL needs to meet the
>>>requirements of a URL string as described in RFC 2396.  There are
>>>two reasons for this.  The first is that IETF will never accept it if
>>>it doesn't, and the second is
>>>
>>Why wouldn't they if it followed the ftp RFC James is referencing?
>>
>
>I didn't say it wouldn't.  I was speaking generally.
>
>>>that most of the browsers out there have generic URL parsers.  Java
>>>also has a generic URL parser, IIRC.
>>>
>>But it's very limited. You have to pretty much override it entirely with
>>the exception of the most trivial of URLs. It only decodes the host,
>>port, file (path), and ref components and it will certainly fail to do
>>even that with anything that has arbitrary characters in the password
>>of some auth info.
>>
>>  http://java.sun.com/products/jdk/1.2/docs/api/java/net/URL.html
>>
>
>Which arbitrary characters?  This is where precedence comes in.  If you
>parse based on the '/' character first (and escape any '/' characters in the
><server> string) then you will never get confused by '@'s elsewhere in the
>URL.
>
>I will try to get time to look at the link above.
>
>>Perhaps we could just drop the authentication creadentials and pass them
>>as QUERY_STRING parameters instead.
>>
>
>No, because the generic URL/URI syntax already supports the use of
>credentials as part of the <server> field.  RFC2396:
>
>
>3.2.2. Server-based Naming Authority
>
>   URL schemes that involve the direct use of an IP-based protocol to a
>   specified server on the Internet use a common syntax for the server
>   component of the URI's scheme-specific data:
>
>      <userinfo>@<host>:<port>
>
>   where <userinfo> may consist of a user name and, optionally, scheme-
>   specific information about how to gain authorization to access the
>   server.  The parts "<userinfo>@" and ":<port>" may be omitted.
>
>      server        = [ [ userinfo "@" ] hostport ]
>
>   The user information, if present, is followed by a commercial at-sign
>   "@".
>
>      userinfo      = *( unreserved | escaped |
>                         ";" | ":" | "&" | "=" | "+" | "$" | "," )
>
>   Some URL schemes use the format "user:password" in the userinfo
>   field. This practice is NOT RECOMMENDED, because the passing of
>   authentication information in clear text (such as URI) has proven to
>   be a security risk in almost every case where it has been used.
>
>
>Now, as you point out, the RFC does not recommend the inclusion of the
>password field.  I agree (and put notes about it in the draft) but I
>seriously doubt that the commercial implementors would be willing to drop
>the password string.
>
And I don't think it should be dropped either, it makes it nicer for 
spiders to just get a url and that will get them the file, and not have 
to get a url and then - oh get the password.

>>Let's not loose the focus of the real problem here though. SMB servers
>>allow the '@' sign but the various URL RFC's do not. If we can circumvent
>>this one issue it will not be necessary to URL encode/decode anything.
>>
>
>The '@' sign has special meaning in the <server> field.  Outside of the
>server field, it does not have meaning.  More below...
>
Exactly

>
>>>As with most parsers, there is an operator hierarchy.  I *think* that
>>>the '/' character is higher precedence than th '@' but I am not sure,
>>>as I haven't looked at it recently.
>>>
>>I don't see how assigning precidence helps.
>>
>
>Consider:
>
>ftp://user:pass@foo.com/some/dir/im@home/text.txt
>
>If the '/' has precedence over '@' then the parse tree is (roughly):
>
>                        ["://"]
>                          / \
>           [scheme: "FTP"]  ['/']
>                             / \
>[server: "user:pass at foo.com"]  [abs_path: "some/dir/im at home/text.txt"]
>
>
>The <abs_path> then parses into <path_segments> and the '@' in "im at home"
>is protected.  There is no way to confuse it with the '@' in the <server>
>field.
>
>I did sort-cut some of the syntax in the above, but the RFC makes it pretty
>clear.  The '/' has higher precedence than the '@', so the <server> field is
>separated from the <path> before you even look for the '@'.
>
No I don't reasoning here (although the outcome is the same).

Neither / nor @ have precedence over anything.  It is just that is the 
only legal combination.

it is not legal to have ftp://user:pass@foo@foo:21/some/thing/else@/here

As we are only allowed zero or one @'s between the first 2 /'s and the 
next /
and if that @appears we are only alowed zero or 1 ':' between the 
second  / and the first @

Maybe it is my interpritation of the word precedence (to mean rank 
higher) which is getting in my way, but I would not say that / comes 
before @
to say that implies (to me at least) you could have 
ftp://user:pass@foo@foo:21/some/thing/else@/here

>
>Am I making any sense?
>
I think we may be arguing the same, but from the earlier posts I thought 
we where slightly different.