[jcifs] Re: SMB URL encoding/decoding

Mon Feb 25 11:39:46 EST 2002

James Nord wrote:
:
> Why is
> 
> smb://HI137/D$/Documents%20and%20Settings/ryar/seti@home.txt
> 
> incorect? @ is a valid pchar.

Because it's a typo.  Ooops.  ;)
Only the spaces needed to be escaped.

> We could alwasy escape all the valid characters-
> 
> smb://HI137/D$/%44%6f%63...
> 
> I see no reason why that is more or less correct?

Right.  You could escape every character.  Remember, though, that we are
talking about people typing the URL string at a command prompt or a browser
window or somesuch.

> >RFC2396 goes into detail regarding
> >the use of whitespace in a URL string (they don't like it), but many
> >browsers will accept the spaces anyway.  (Just as many browsers will
> >accept really bad HTML code and render it anyway...browsers are in the
> >business of making things easy when they can.)
> >
> Always be strict on forming and forgiving on parsing.

Yep.

> > A note... I looked all through the RFC and found nothing about
> > translating the '+' into a space.  Annoying, as I know it was there
> > in the early days.
> >
> Is this not just special for form data in HTTP?  I don't recall seeing
> this escape sequence in a path before. (not really important now in any
> case)

You may be very right.  I know that the + is sometimes used, but I can't
remember where or why.  It's a big "I haven't a clue" from me on this one.

> >So the problem is that you need to escape a ';' if you want to use it
> >in a path, but not if you use it in <userinfo>, and you have to escape
> >an '@' if you want to use it in <userinfo> but not in the path.
> >
> And then the domain bit falls over as it is permited to have many ;'s in
> there :-(

Um, no...  The <userinfo> field allows as many ';'s as you like, but the SMB
URL specifies a new (descendent) syntax for the <userinfo> field that makes
the ';' a delimiter within that field.

So, we replace:
    userinfo  = *( unreserved | escaped |
                 ";" | ":" | "&" | "=" | "+" | "$" | "," )
with:
    userinfo  = user [ ":" password ]
    user      = [ ntdomain ";" ] username

and then to be pedantic we would specify the valid characters for
username, password, and ntdomain.

> >It may contain anything that <userinfo> may contain, including a colon.
> >The *first* colon in <userinfo> is used as a delimiter.  
> >
> But the *smb url draft* does not allow for more than one colon in the 
> userinfo part.
> 
>       smb_server    = [ [ smb_userinfo "@" ] smb_srv_name [ ":" port ] ]
> 
>       smb_userinfo  = [ ntdomain ";" ] username [ ":" password ]
>       ntdomain      = *( unreserved | escaped |
>                          "&" | "=" | "+" | "$" | "," )
>       username      = *( unreserved | escaped |
>                          "&" | "=" | "+" | "$" | "," )
>       password      = *( unreserved | escaped |
>                          "&" | "=" | "+" | "$" | "," )

Yes.  You're right.  I did a better job in the draft than in my e'mail.
:)

> Ah, doesn't the draft have precedence in this case over the URL RFC as 
> it is more explicit.
> (yes draft changing work in progress etc...) but I always look at the 
> most specific ;-)

I think of the draft as defining a descendant type.  Again, you're correct
here.

Mike wrote:
> 
> Correction: An unescaped '%' would also cause it to fail because we
> have to URL decode regardless of how forgiving the parser is and the
> '%' might be interpreted as the beginning of an escape.

Yeah.  I like James' comment about when to be strict and when to be
forgiving.

Chris -)-----

-- 
Samba Team -- http://www.samba.org/     -)-----   Christopher R. Hertel
jCIFS Team -- http://jcifs.samba.org/   -)-----   ubiqx development, uninq.
ubiqx Team -- http://www.ubiqx.org/     -)-----   crh at ubiqx.mn.org