[jcifs] Creating file with hash ('#') in filename
Allen, Michael B (RSCH)
Michael_B_Allen at ml.com
Mon Jan 20 09:14:25 EST 2003
> -----Original Message-----
> From: Christopher R. Hertel [SMTP:crh at ubiqx.mn.org]
> Sent: Friday, January 17, 2003 6:20 PM
> To: Michael B. Allen
> Cc: jcifs at samba.org
> Subject: Re: [jcifs] Creating file with hash ('#') in filename
>
> On Fri, Jan 17, 2003 at 05:33:55PM -0500, Michael B. Allen wrote:
> :
> > > > No. You cannot have a cryllic filename on a web server. You can have a
> > > > cryllic *link* displayed in the page but the filenames and all parts of
> > > > the URL are ASCII. There might be extensions to this. I don't know. But
> > > > URLs are 100% good ol' ASCII.
> > >
> > > Then you cannot have a cyrillic filename in the SMB URL. It's the *same
> > > problem*. ...but there is a solution.
> >
> > Sure you can. Cryllic like Unicode is a character set. It's not an
> > encoding. KIO8-R is a Cryllic encoding:
> >
> > http://czyborra.com/charsets/cyrillic.html
> >
> > But this is nomenclature. You cannot have an HTTP URL in any encoding
> > other than ASCII. But you can have an SMB URL encoded in any encoding
> > because we are accepting Unicode and it is the superset of all character
> > sets.
>
> Urg. No. That's not the point.
>
Ok, well I'm not really sure what you're trying to get at. But I think we're both
being a little pedantic in this thread.
Let's review what we agree on. The characters that are required to be escaped
in the SMB URL for RFC2396 conformance are:
' |#%^`{}'
and non-ASCII characters. However because SMB path names support
Unicode, how these characters would be escaped is not clear. If each character
was converted to a UTF-8 multibyte sequence and each byte in turn were
escaped the frequency and appearence of these URLs would make the process
unreasonable and for many scripts (e.g. Cryllic) they would be pathologically
unusable.
That's the problem. Right?
> I am suggesting that an implementation, such as jCIFS, may safely break
> this rule.
>
At the moment we do not have much choice. CIFS is a Unicode protocol. We
MUST provide a way to escape the escaping. However you do realise that URLs
with Unicode characters cannot be embedded into web pages and other similar
places one might find them because it may be assumed that all characters in
any URL are ASCII? I think we need to investigate the state of escaping
Unicode in URLs. Certainly it has been discussed and implemented in one form
or another. Is there a standard for it?
> represent kanji in its current settings. This new problem (which you
> correctly bring up) is that I now need to enter escapes in order to
> connect to a server offering files with kanji names. Ouch. Which
> encoding do I use? UTF-8? UCS2LE?
>
> I don't have an answer, but it's a good question.
>
Well we're not really concerned with an "encoding" because we know what the
encoding is going to be; ASCII. The question is more like how do you represent
a value that can be between 0x80 and 0x10FFFF in a sequence of ASCII
characters? But it's not like we can just make something up.
More information about the jcifs
mailing list