[jcifs] Creating file with hash ('#') in filename

Christopher R. Hertel crh at ubiqx.mn.org
Mon Jan 20 10:04:26 EST 2003


"Allen, Michael B (RSCH)" wrote:
:
> Ok, well I'm not really sure what you're trying to get at. But I
> think we're both being a little pedantic in this thread.

:)

Yes, but it is that kind of question.  We are probably not the only ones
debating it either, but I don't have the cycles to find other references.
Sigh.

> 
> Let's review what we agree on. The characters that are required to be
> escaped in the SMB URL for RFC2396 conformance are:
> 
>           ' |#%^`{}'
> 
> and non-ASCII characters. However because SMB path names support
> Unicode, how these characters would be escaped is not clear. If each
> character was converted to a UTF-8 multibyte sequence and each byte in
> turn were escaped the frequency and appearence of these URLs would make
> the process unreasonable and for many scripts (e.g. Cryllic) they would
> be pathologically unusable.
> 
>         That's the problem. Right?

Yes.  Good summary, but I will add one more problem to the pile.
(...which I think you were the one to point out.)

In order to escape the non-ASCII characters, there must be an agreed-upon
encoding.  Consider this:  %12%AB
Is that 0x12AB or 0xAB12?  In UCS2LE encoding, it would be the latter.
How would the same value be represented in UTF-8?

So, without a standard for Unicode escape sequences within a URL, there
really is no good way to escape those characters anyway.  At this point,
until we find such a standard (if, indeed, such a standard exists) the
only way to handle non-ASCII characters is literally.  In some contexts,
that may mean converting from the character set in use on the terminal.

> > I am suggesting that an implementation, such as jCIFS, may safely break
> > this rule.
> >
> At the moment we do not have much choice. CIFS is a Unicode protocol.
> We MUST provide a way to escape the escaping. However you do realise
> that URLs with Unicode characters cannot be embedded into web pages and
> other similar places one might find them because it may be assumed that
> all characters in any URL are ASCII?

Yes, that's a problem, though web pages can be assigned a character encoding
as well.  I imagine that there is a work-around, but I have no idea what it
is.

> I think we need to investigate the state of escaping Unicode in URLs.

Yes.  ...and also of presenting un-escaped Unicode characters in URLs if the
encoding is known.

> Certainly it has been discussed and implemented in one form or another.
> Is there a standard for it?

That's what I don't know.  I'm over my head in authentication right now and
have not had time to look.

> > represent kanji in its current settings.  This new problem (which you
> > correctly bring up) is that I now need to enter escapes in order to
> > connect to a server offering files with kanji names.  Ouch.  Which
> > encoding do I use?  UTF-8?  UCS2LE?
> >
> > I don't have an answer, but it's a good question.
> >
> Well we're not really concerned with an "encoding" because we know what
> the encoding is going to be; ASCII.

I am using the term "encoding" to mean the mapping of a sequence of octets
to and from character representations.  In other words, using "encoding" as
I'm using it, UTF-8 and UCS2LE would be different encodings because the same
character is mapped to different byte sequences and vice-versa.  The old DOS
codepages would also be encodings.

I may be mangling the term.  I don't know the correct terminology here.

> The question is more like how do you represent a value that can be
> between 0x80 and 0x10FFFF in a sequence of ASCII characters? But it's
> not like we can just make something up.

Right.

This is a big kettle of worms.  I think you're right that we won't find the
answers without talking to the folks who are already working on these
problems.

Chris -)-----

-- 
Samba Team -- http://www.samba.org/     -)-----   Christopher R. Hertel
jCIFS Team -- http://jcifs.samba.org/   -)-----   ubiqx development, uninq.
ubiqx Team -- http://www.ubiqx.org/     -)-----   crh at ubiqx.mn.org
OnLineBook -- http://ubiqx.org/cifs/    -)-----   crh at ubiqx.org



More information about the jcifs mailing list