smb://

Sat Dec 30 00:47:22 GMT 2000

On Fri, 29 Dec 2000, Kevin Colby wrote:

> > > The backslash character is almost universally used as an escape
> > > character; that plus the fact that the RFC for URIs specifically
> > > excludes it in URIs makes it an unsuitable choice.

> > Yes, the RFC specifically excludes its use.  But RFCs are intended to be
> > guides, not laws, and occasionally the RFCs need to be revised.  I think
> > this may be an RFC that could use a footnote.

> This is a legitmate criticism.  The RFC does not have to be followed.

> I have to question the motive here, though.  If the only dispute were
> whether to use ";" or "\", and ";" is RFC-compliant, and "\" is not,
> why should we use the latter and possibly amend the RFC?  Because NT
> users are more familiar with it?  That seems a very poor reason for
> such drastic action.

Lacking any concrete reason why the backslash should be disallowed by the
RFC, I do think that user familiarity is enough of a reason to insist on using
'\'.  Perhaps more important, however, is the fact that the RFC suggests the
"accepted" use of the ";" character would imply the syntax
<user>[;domain=<domain>], *NOT* [<Domain>;]<user>, which is what we're looking
for.  This means that out of the currently available reserved characters that
can be used in the authority component of a URI, "@" and ":" cannot be used
unambiguously as separators, and ";" cannot be used without overloading the
syntactic meaning it carries in URIs of other schemes.

According to the RFC, the userinfo component contains 

userinfo      = *( unreserved | escaped |
                   ";" | ":" | "&" | "=" | "+" | "$" | "," )

and 'unreserved' contains the following chars in addition to alphanum:

mark        = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"

so if you would prefer, one of these other 'special' characters could be used
as the delimiter, assuming it's also considered an invalid character in both
a Windows username and an NT domainname.  I imagine that the "=" and "+"
characters are undesirable because they would be coupled with ";" in the
construction ";option1=value;option2=multi+word+value".  "-" and "_" are bad
because they could appear in the userinfo portion of URLs that point to
Unix-based SMB servers.  That still leaves a fair number of RFC-compliant,
non-alphanum chars to choose from.  Which among these, if any, will conflict
with the charset used in Windows usernames and/or NT domain names, I have no
idea. :)  Also, it may be possible to reclaim ":" for the delimiter.  I recall
that Richard has said he doesn't plan to carry passwords around in URLs, and
the RFC says this practice is not recommended, so if there's consensus that
passwords-in-URLs are bad enough to never be used, then the ":" is again
available if we want it.

Oh... and lest anyone worry overly much about total RFC-compliance, I'll refer
to section 3.2.2:

   The host is a domain name of a network host, or its IPv4 address as a
   set of four decimal digit groups separated by ".".  Literal IPv6
   addresses are not supported.

      hostport      = host [ ":" port ]
      host          = hostname | IPv4address
      hostname      = *( domainlabel "." ) toplabel [ "." ]
      domainlabel   = alphanum | alphanum *( alphanum | "-" ) alphanum
      toplabel      = alpha | alpha *( alphanum | "-" ) alphanum

      IPv4address   = 1*digit "." 1*digit "." 1*digit "." 1*digit
      port          = *digit

   Hostnames take the form described in Section 3 of [RFC1034] and
   Section 2.1 of [RFC1123]: a sequence of domain labels separated by
   ".", each domain label starting and ending with an alphanumeric
   character and possibly also containing "-" characters.  The rightmost
   domain label of a fully qualified domain name will never start with a
   digit, thus syntactically distinguishing domain names from IPv4
   addresses, and may be followed by a single "." if it is necessary to
   distinguish between the complete domain name and any local domain.
   To actually be "Uniform" as a resource locator, a URL hostname should
   be a fully qualified domain name.  In practice, however, the host
   component may be a local domain literal.

This implies that the only acceptable values for 'host' are an IPv4
quad-dotted address, or a DNS-based hostname (FQDN or not).  So by using
netbios names at all, we may already be outside the RFC as it's currently
written. :)

Steve Langasek
postmodern programmer