[jcifs] SMB URL parsing buglet?

Fri Sep 6 02:14:26 EST 2002

"Michael B. Allen" wrote:
:
> > The string form of the URL, when displayed, should always be presented
> > as fully encoded.
> 
> What do you mean by "presented"? Certainly you don't unconditionally
> encode it when for example toString() is called etc.

Why not?  

> Right,  I  think  '@' is the real culprit and it is the only character
> that really  needs  to be encoded without regard for special characters
> reserved by  CIFS  (can't think of any clashes there). And ... of course
> you have to encode '%' to escape the encoding itself.

My thinking is that something like toString() should return a
syntactically correct URL string.  That would mean that all reserved
characters would be encoded.  The difficulty being that each field in the
URL has its own set of reserved characters.  I imagine that the way to do
this is to define a URLField class which contains the list of characters
that is always reserved:

reserved    = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
              "$" | "," | "[" | "]"

Descendant types would add their reserved values to that list.  When
generating a string from the internal representation of the field, any
reserved values in the field would be encoded.

> But otherwise I cannot really think of a reason to encode anything at
> all.

The reason has to do with transportability, which is a key feature of
URLs.  I should be able to give you a URL string (the RFCs talk about
writing URLs on napkins at restaurants) and it should be meaningful.
Putting aside the semantic problem of NetBIOS namespaces (which is
inherent in SMB, so we can't really fix it in the SMB URL unless we
disallow NetBIOS...which won't work) the main barrier to transportability
would be that each URL parser may handle syntactically-incorrect strings
differently.

> For NetworkExplorer even the NetworkExporer code should encode those
> paths.

Should?  I'm lost...

> I think the key here this very simple premise: You get out what you put
> in.

Perhaps there should be two methods then.  One that returns what the user
put in, and one that returns the corrected field.

> If you pass in URL encoding and call toString or getPath or
> getCanonicalPath ...etc, you get back whatever was encoded. Of course
> when you parse it internally you decode any escapes to get to each
> component and the raw path used to pass to SMB operations.

...but when you get a string with syntax errors you have to do some
interpretation.  I think it would be worth-while to hand back anything
that had to be figured out.

> > If I type in something like
> >
> >   "smb://192.168.101.16/c/My Download Files"
> >
> > I would expect that a method that returns the URL string resulting from
> > the parsing of that input would give me:
> >
> >   smb://192.168.101.16/c/My%20Download%20Files
> >
> > ...which is the correct form of the URL.  Notice that no semantic
> > translations are done (i.e., it didn't replace the IP address with the
> > NetBIOS or DNS name).  Only syntactic corrections are made.
> 
> Not  sure  what  your getting at here. If you are suggesting that
> toString, getPath,  and/or  getCanonicalPath should return a URL encoded
> path that is not right. I don't think any URL specification would say
> that is necessary unless  the  path  were  transmitted  over  a network
> in an HTTP request or similar that is completely independent of client
> string handling.

The string:

  "smb://192.168.101.16/c/My Download Files"

would be an invalid URL.  The spaces in the path are syntax errors.
[http://www.faqs.org/rfcs/rfc2396.html, Section 2.4.3.].  One way to
"fix" that string is to remove the whitespace [same ref, appendix E],
which would give:

  "smb://192.168.101.16/c/MyDownloadFiles"

Another way to do it is to assume that the spaces are really spaces and
read them that way.  The problem, of course, is that syntax errors like
that are open to interpretation.  On the other hand, the handling of

  smb://192.168.101.16/c/My%20Download%20Files

is clearly defined.

Fortunately for the users (and unfortunately for programmers) it is
possible to guess at what the user actually meant so a lot of programs
(Netscape, Mozilla, etc.) try to be 'helpful'.  Some folks say that this
has made users lazy, but I think  we are dealing with a very complex and
important user interface and that being accomodating is reasonable.

Anyway, if I put the string with spaces into an SMB URL object then I
imagine two ways in which the object might return the string when asked.

1) as you suggest, it would be the same as what was given.
2) the corrected version, properly escaped.

Personally, I would prefer to see the latter echoed back to me by a
browser or other tool, but then I'm pedantic.  Since jCIFS is a toolkit
providing methods for both means that the application writer can decide
for themselves what it is they want to do.

Chris -)-----

-- 
Samba Team -- http://www.samba.org/     -)-----   Christopher R. Hertel
jCIFS Team -- http://jcifs.samba.org/   -)-----   ubiqx development, uninq.
ubiqx Team -- http://www.ubiqx.org/     -)-----   crh at ubiqx.mn.org
OnLineBook -- http://ubiqx.org/cifs/    -)-----   crh at ubiqx.org