[jcifs] SMB URL parsing buglet?

Thu Sep 5 18:36:29 EST 2002

On Thu, 05 Sep 2002 00:54:16 -0500
"Christopher R. Hertel" <crh at ubiqx.mn.org> wrote:

> > > Mike,
> > >
> > > java List smb://192.168.101.16/c/My%20Download%20Files
> > >
> > > dumps an exception, but
> > >
> > > java List "smb://192.168.101.16/c/My Download Files"
> > >
> > > works fine.  The % escapes should work, though.
> > >
> >         True. I believe we determined it was only necessary to URL
> >         encode the authority component before the '@' (e.g. another '@').
> >         So if it's not necessary to URL encode path names I don't bother
> >         to try and decode them for the sake of performance. But with
> >         applications like NetworkExplorer it would be a great idea to
> >         URL decode these.
> 
> I don't think I would have determined that.
> 
> The string form of the URL, when displayed, should always be presented as
> fully encoded.

What do you mean by "presented"? Certainly you don't unconditionally encode
it when for example toString() is called etc. 

>  True, some users will type the URLs in with "illegal"
> characters and it is nice if the code can cope.  It's also true that
> different parts of the URL string will require that different characters are
> encoded.  The '@', for instance, needs to be encoded if it is *not* used as
> delimitier in the authority component.  It probably doesn't need
> to be encoded in the path (though I haven't checke the syntax to be sure).

Right,  I  think  '@' is the real culprit and it is the only character that
really  needs  to be encoded without regard for special characters reserved
by  CIFS  (can't think of any clashes there). And ... of course you have to
encode '%' to escape the encoding itself.

But  otherwise I cannot really think of a reason to encode anything at all.
For NetworkExplorer even the NetworkExporer code should encode those paths.

I think the key here this very simple premise: You get out what you put in.
If   you   pass   in   URL   encoding  and  call  toString  or  getPath  or
getCanonicalPath  ...etc, you get back whatever was encoded. Of course when
you parse it internally you decode any escapes to get to each component and
the raw path used to pass to SMB operations.

> 
> Each part of the string has a different set of reserved characters, which is
> why it is so hard for a user to get it right, and why browsers are so
> forgiving.  Of course, when some pedantic fool like me *does* get it right,
> then it should work.  :)
> 
> Decoding a URL string from the command line only happens when the user hits
> return, so I don't see it as a performance issue.  I imagine that it would
> be best (and I haven't look at the code recently either, so forgive me if
> I'm talking through my elbow) if the URL object kept both the presentation
> form and the decoded form of each field handy, or kept NULL for the
> presentation form until it was requested (that is, cache it).
> 
> If I type in something like 
> 
>   "smb://192.168.101.16/c/My Download Files"
> 
> I would expect that a method that returns the URL string resulting from
> the parsing of that input would give me:
> 
>   smb://192.168.101.16/c/My%20Download%20Files
> 
> ...which is the correct form of the URL.  Notice that no semantic
> translations are done (i.e., it didn't replace the IP address with the
> NetBIOS or DNS name).  Only syntactic corrections are made.

Not  sure  what  your getting at here. If you are suggesting that toString,
getPath,  and/or  getCanonicalPath should return a URL encoded path that is
not  right. I don't think any URL specification would say that is necessary
unless  the  path  were  transmitted  over  a network in an HTTP request or
similar that is completely independent of client string handling.