SMB URLs, was: [jcifs] Equivalent for java.io.File.getCanonicalPath()

Julian Reschke julian.reschke at gmx.de
Sat Jan 31 10:40:28 GMT 2004


Michael B Allen wrote:

>>>>3) URL handling: according to RFC2396, URLs never ever contain blanks or
>>>>non-ASCII characters. However, jCifs accepts those, returns them in this
>>>>format and the documentation even uses names with blanks in examples.
>>>>This probably should be cleaned up. I assume that the support can't be
>>>>removed due to backward compatibility reasons?
>>>
>>>
>>>No. JCIFS must support all characters supported by SMB paths or it just
>>>won't work. There are few problems such as '#' in a path will be
>>
>>Of course, however the question is *how* to do that. URLs by definition
>>contain only ASCII characters, so the SMB URL draft should specify how
>>non-ASCII characters (and reserved ASCII characters) should be encoded
>>in SMB URLs (even if the code continues to accept non-ASCII characters).
> 
> 
> The SMB URL draft will follow the W3C IRI initiative that specifies how to
> escape non-ascii characters in a URL:
> 
>   http://www.w3.org/TR/charmod/#sec-URIs

OK, that's obviously the right thing to do. In which case it's really a 
IRI scheme, not a URL scheme (but in fact with a canonical (sic) mapping 
to URLs).

> JCIFS currently does not support this because we use the java.net.URL
> class for almost all URL handling. We have to to support the Java URL
> protocol handler. In a future version of jCIFS the java.net.URI class will
> be used which automatically decodes URL escape sequences. However we will
> always support using Unicode URLs without escape sequences even if it
> means violating RFC2396, the IRI spec, and the SMB URL draft if that odd
> scenario should arise.
> 
> 
>>>interpreted by the java.net.URL parser as a reference. Otherwise, we try
>>
>>That seems correct. If you need a "#" in a URL path segment, you need to
>>escape it.
> 
> 
> But we don't decode escapes so it (and '%' I believe) are the cause of
> some grief.

Seems that at some point of time at least the documentation needs to be 
clarified, and possibly alternate constructions/factory menthods should 
be added that use proper URLs.

>>>to comply with RFC2396 wherever possible but there are no plans to be
>>>completely compatible with it.
>>
>>Well, if the SMB draft is supposed to be accepted by the IETF, it simply
>>has to be compliant to the base spec.
> 
> 
> Well I don't know what Chris is going to do. It's a tough position because
> 2396 just doesn't mate well with CIFS. CIFS is a WAN network filesystem
> which means you are going to need to support Unicode and not just the
> locale dependant encoding. JCIFS is not going to require hex escapes to
> encode non-ascii characters.

In which case it won't be a URL.

I think the right thing to do (as mentioned by yourself) is to simply 
state that what's currently called a "SMB URL" is in fact an IRI, and 
that code that requires URIs simply should use the default 
transformation (UTF-8 encoding, then hex-escaping).

However I'm not sure what this has to do with Unicode vs locale 
encoding. Locale encoding never ever should affect identifiers on the 
wire. The URL "http" and "file" URI schemes have exactly the same 
requirements and in fact the same issues. It would be good if these 
issues could be avoided when the SMB scheme (be it URI or IRI) is 
documented (is this the right list to discuss this, if not please point 
me to the correct one).

Regards, Julian




-- 
<green/>bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760


More information about the jcifs mailing list