[jcifs] Creating file with hash ('#') in filename

Christopher R. Hertel crh at ubiqx.mn.org
Wed Jan 15 09:08:16 EST 2003


On Tue, Jan 14, 2003 at 03:47:12PM -0500, Michael B. Allen wrote:
:
> No, I mean URL decode. Currently we only un-escape the user info
> component. Thus the problem. I do not want to un-escape the path because
> we want users to be able to have a path like:
> 
>    "smb://server/share/path/to/me @ the beach.jpg"
> 
> without requireing the spaces and '@' to be escaped.

Well, the URL is parsed first and un-escaped last.  Each field has a 
(slightly) different set of disallowed characters.  In general, the '/' 
must always be escaped if used as it is the at the highest level of the 
syntax tree.  If it is not being used to delimit the user_info field, the 
'@' must be escaped within the server field because it is a lexical token 
within that field.

Regarding the path, RFC2396 says that the following are allowed:

      pchar         = unreserved | escaped |
                      ":" | "@" | "&" | "=" | "+" | "$" | ","

So the '@' is permitted but *the spaces are not*.  Actually, the status of
the spaces are not really clear to me from the given syntax.  They are not
in the unreserved set, however, so the URL above is not a valid URL (but
you could probably fudge it).  The spaces also present another problem,
which you quietly side-stepped.  Most command line interpreters will need
to have URL string in quotation marks as shown, or will need to have the 
spaces escaped, order to read it.

The "right" (also in quotation marks) way to enter the above URL would be:

  smb://server/share/path/to/me%20@%20the%20beach.jpg

I think that a "smart" user agent might probably handle the URL as you 
gave it above.

> > So, if the user *did* add a #ref, then it's a user error.  If the # is 
> > escaped, then you would simply need to un-escape the string and *poof*, 
> > you have the intended path.
> 
> The java.net.URL class parses the URL and *before* the jcifs.smb.Hanlder
> gets it. So the '#ref' is getting picked out. I was just saying perhaps
> I can append it back on to create an internal path that retains it.

Only if you want to bypass the standard syntax for URLs.  :)
The # (if unescaped) in that position should be a delimiter and the
pedantic way to handle it is to cough back an error.  It is
*syntactically* correct, but there are no *semantics* defined for it in
the SMB URL so it would be a *semantic* error.

Of course, patching it back together would be a convenience for the user 
(until such time as a workable meaning exists for the #ref syntax under 
the SMB URL...and then...).

> Otherwise I have to un-escape the path component which is slow, requires
> more code for this one little use-case, requres that '%' be escaped,
> etc. Or you require everything be escaped in which case you have to
> re-escape everything before doling out new URLs. etc. etc. etc. Just
> more problems.

...but that's the right way to do it.

Actually, there may be an easy way out.  Just don't un-escape the URL 
instance variables.  Keep the URL itself (as parsed and maintained within 
the java.net.URL instance) in the format provided on the command-line.  
Un-escape it whenever you copy it into jCIFS space for use with SMB.

...and yes, if a path or server or whatever includes a '%' sign it should 
be escaped on the command line.

If this URL stuff is going to work, it all has to fit the general syntax
of URLs.  It is "okay" to try and figure out what the user actually meant
if they make obvious syntax errors.  Things like including spaces in the 
URL...

One more pedantic note:  The generic syntax for URLs does not allow 
escaped characters in the server name:

      hostname      = *( domainlabel "." ) toplabel [ "." ]
      domainlabel   = alphanum | alphanum *( alphanum | "-" ) alphanum
      toplabel      = alpha | alpha *( alphanum | "-" ) alphanum

That's old DNS syntax (which, in theory, is incorrect--I should point that
out to the RFC2396 folks).  :)

On the other hand, try this:

  http://www.ietf.org/rfc/r%66c2396.txt

...that works.  :)

Chris -)-----

-- 
Samba Team -- http://www.samba.org/     -)-----   Christopher R. Hertel
jCIFS Team -- http://jcifs.samba.org/   -)-----   ubiqx development, uninq.
ubiqx Team -- http://www.ubiqx.org/     -)-----   crh at ubiqx.mn.org
OnLineBook -- http://ubiqx.org/cifs/    -)-----   crh at ubiqx.org



More information about the jcifs mailing list