[jcifs] Creating file with hash ('#') in filename
Christopher R. Hertel
crh at ubiqx.mn.org
Mon Jan 20 10:04:26 EST 2003
"Allen, Michael B (RSCH)" wrote:
:
> Ok, well I'm not really sure what you're trying to get at. But I
> think we're both being a little pedantic in this thread.
:)
Yes, but it is that kind of question. We are probably not the only ones
debating it either, but I don't have the cycles to find other references.
Sigh.
>
> Let's review what we agree on. The characters that are required to be
> escaped in the SMB URL for RFC2396 conformance are:
>
> ' |#%^`{}'
>
> and non-ASCII characters. However because SMB path names support
> Unicode, how these characters would be escaped is not clear. If each
> character was converted to a UTF-8 multibyte sequence and each byte in
> turn were escaped the frequency and appearence of these URLs would make
> the process unreasonable and for many scripts (e.g. Cryllic) they would
> be pathologically unusable.
>
> That's the problem. Right?
Yes. Good summary, but I will add one more problem to the pile.
(...which I think you were the one to point out.)
In order to escape the non-ASCII characters, there must be an agreed-upon
encoding. Consider this: %12%AB
Is that 0x12AB or 0xAB12? In UCS2LE encoding, it would be the latter.
How would the same value be represented in UTF-8?
So, without a standard for Unicode escape sequences within a URL, there
really is no good way to escape those characters anyway. At this point,
until we find such a standard (if, indeed, such a standard exists) the
only way to handle non-ASCII characters is literally. In some contexts,
that may mean converting from the character set in use on the terminal.
> > I am suggesting that an implementation, such as jCIFS, may safely break
> > this rule.
> >
> At the moment we do not have much choice. CIFS is a Unicode protocol.
> We MUST provide a way to escape the escaping. However you do realise
> that URLs with Unicode characters cannot be embedded into web pages and
> other similar places one might find them because it may be assumed that
> all characters in any URL are ASCII?
Yes, that's a problem, though web pages can be assigned a character encoding
as well. I imagine that there is a work-around, but I have no idea what it
is.
> I think we need to investigate the state of escaping Unicode in URLs.
Yes. ...and also of presenting un-escaped Unicode characters in URLs if the
encoding is known.
> Certainly it has been discussed and implemented in one form or another.
> Is there a standard for it?
That's what I don't know. I'm over my head in authentication right now and
have not had time to look.
> > represent kanji in its current settings. This new problem (which you
> > correctly bring up) is that I now need to enter escapes in order to
> > connect to a server offering files with kanji names. Ouch. Which
> > encoding do I use? UTF-8? UCS2LE?
> >
> > I don't have an answer, but it's a good question.
> >
> Well we're not really concerned with an "encoding" because we know what
> the encoding is going to be; ASCII.
I am using the term "encoding" to mean the mapping of a sequence of octets
to and from character representations. In other words, using "encoding" as
I'm using it, UTF-8 and UCS2LE would be different encodings because the same
character is mapped to different byte sequences and vice-versa. The old DOS
codepages would also be encodings.
I may be mangling the term. I don't know the correct terminology here.
> The question is more like how do you represent a value that can be
> between 0x80 and 0x10FFFF in a sequence of ASCII characters? But it's
> not like we can just make something up.
Right.
This is a big kettle of worms. I think you're right that we won't find the
answers without talking to the folks who are already working on these
problems.
Chris -)-----
--
Samba Team -- http://www.samba.org/ -)----- Christopher R. Hertel
jCIFS Team -- http://jcifs.samba.org/ -)----- ubiqx development, uninq.
ubiqx Team -- http://www.ubiqx.org/ -)----- crh at ubiqx.mn.org
OnLineBook -- http://ubiqx.org/cifs/ -)----- crh at ubiqx.org
More information about the jcifs
mailing list