vorlon at netexpress.net
Sat Dec 30 00:47:22 GMT 2000
On Fri, 29 Dec 2000, Kevin Colby wrote:
> > > The backslash character is almost universally used as an escape
> > > character; that plus the fact that the RFC for URIs specifically
> > > excludes it in URIs makes it an unsuitable choice.
> > Yes, the RFC specifically excludes its use. But RFCs are intended to be
> > guides, not laws, and occasionally the RFCs need to be revised. I think
> > this may be an RFC that could use a footnote.
> This is a legitmate criticism. The RFC does not have to be followed.
> I have to question the motive here, though. If the only dispute were
> whether to use ";" or "\", and ";" is RFC-compliant, and "\" is not,
> why should we use the latter and possibly amend the RFC? Because NT
> users are more familiar with it? That seems a very poor reason for
> such drastic action.
Lacking any concrete reason why the backslash should be disallowed by the
RFC, I do think that user familiarity is enough of a reason to insist on using
'\'. Perhaps more important, however, is the fact that the RFC suggests the
"accepted" use of the ";" character would imply the syntax
<user>[;domain=<domain>], *NOT* [<Domain>;]<user>, which is what we're looking
for. This means that out of the currently available reserved characters that
can be used in the authority component of a URI, "@" and ":" cannot be used
unambiguously as separators, and ";" cannot be used without overloading the
syntactic meaning it carries in URIs of other schemes.
According to the RFC, the userinfo component contains
userinfo = *( unreserved | escaped |
";" | ":" | "&" | "=" | "+" | "$" | "," )
and 'unreserved' contains the following chars in addition to alphanum:
mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"
so if you would prefer, one of these other 'special' characters could be used
as the delimiter, assuming it's also considered an invalid character in both
a Windows username and an NT domainname. I imagine that the "=" and "+"
characters are undesirable because they would be coupled with ";" in the
construction ";option1=value;option2=multi+word+value". "-" and "_" are bad
because they could appear in the userinfo portion of URLs that point to
Unix-based SMB servers. That still leaves a fair number of RFC-compliant,
non-alphanum chars to choose from. Which among these, if any, will conflict
with the charset used in Windows usernames and/or NT domain names, I have no
idea. :) Also, it may be possible to reclaim ":" for the delimiter. I recall
that Richard has said he doesn't plan to carry passwords around in URLs, and
the RFC says this practice is not recommended, so if there's consensus that
passwords-in-URLs are bad enough to never be used, then the ":" is again
available if we want it.
Oh... and lest anyone worry overly much about total RFC-compliance, I'll refer
to section 3.2.2:
The host is a domain name of a network host, or its IPv4 address as a
set of four decimal digit groups separated by ".". Literal IPv6
addresses are not supported.
hostport = host [ ":" port ]
host = hostname | IPv4address
hostname = *( domainlabel "." ) toplabel [ "." ]
domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum
toplabel = alpha | alpha *( alphanum | "-" ) alphanum
IPv4address = 1*digit "." 1*digit "." 1*digit "." 1*digit
port = *digit
Hostnames take the form described in Section 3 of [RFC1034] and
Section 2.1 of [RFC1123]: a sequence of domain labels separated by
".", each domain label starting and ending with an alphanumeric
character and possibly also containing "-" characters. The rightmost
domain label of a fully qualified domain name will never start with a
digit, thus syntactically distinguishing domain names from IPv4
addresses, and may be followed by a single "." if it is necessary to
distinguish between the complete domain name and any local domain.
To actually be "Uniform" as a resource locator, a URL hostname should
be a fully qualified domain name. In practice, however, the host
component may be a local domain literal.
This implies that the only acceptable values for 'host' are an IPv4
quad-dotted address, or a DNS-based hostname (FQDN or not). So by using
netbios names at all, we may already be outside the RFC as it's currently
More information about the samba-technical