[jcifs] Character Set discussions
Christopher R. Hertel
crh at ubiqx.mn.org
Wed Feb 5 13:36:47 EST 2003
On Tue, Feb 04, 2003 at 08:17:48PM -0500, Allen, Michael B (RSCH) wrote:
:
> > > odd character but for regularly occuring Unicode it's just insanity.
> >
> > Why? The user only ever sees it if they need to escape something, which
> > (since they are using Unicode) would only happen if the character is a
> > reserved character within the ASCII set.
> >
> For non-Latin users I think this would happen pretty frequently although I
> don't know how they're getting around the problem right now. They are
> probably restricting themselves to just using ASCII. But that's not an
> option for SMB URLs.
I would think that non-Latin users would have keyboards and software that
allow them to enter the Unicode characters they need.
> > Also, the input encoding doesn't matter as long as the underlying system
> > knows what it is. It could be UCS-2LE, for instance, and as long as the
> > terminal or browser knows that it can convert as necessary.
> >
> Right. The underlying system will just use Unicode. If it's wchar_t then
> it's probably UCS codes. The program will probably never know what the
> actual "encoding" is. It's just numbers in memory. Or perhaps you're
> thinking of the way it's encoded in the computers memory. Not matter.
Right. It shouldn't matter. We're just going to get a string of bytes.
> But there are still the problem of representing these URLs in files
> like within an HTML document or configuration file. It's a little
> optimistic to think these serialized forms will all be in one Unicode
> encoding or another.
There is *supposed* to be a header declaring the encoding of the file (if
it's in HTML, for example). It will, as you suggest, take the Latin world
a while to get used to this.
> > Absolutely not. That's why then need to be able to enter it as Unicode
> > text, not as escapes.
> >
> Again, same issue of serialization. But if everyone displays Unicode then
> we'd be ok. So far Linux isn't up to the task. Actually Red Hat 8 uses a
> UTF-8 locale now by default so I suppose that is changing. The big
> question is the browsers. It all hinges on what they support.
Yes, probably. With China going Linux, though, I think we'll see a lot
more emphasis on Internationalization. Also a lot more emphasis on IPv6.
Hmmm... that latter point would mean that we really need to work on port
445 stuff. NBT doesn't work over IPv6, but naked transport does.
> > > Whohoo! Allright! Here we go .....
> >
> > I'm not sure what to make of that, Mike. :)
> >
> Just kidding.
Ah. Now I get it. No, I don't. Wait... let me catch the bus...
Chridz -)-----
--
Samba Team -- http://www.samba.org/ -)----- Christopher R. Hertel
jCIFS Team -- http://jcifs.samba.org/ -)----- ubiqx development, uninq.
ubiqx Team -- http://www.ubiqx.org/ -)----- crh at ubiqx.mn.org
OnLineBook -- http://ubiqx.org/cifs/ -)----- crh at ubiqx.org
More information about the jcifs
mailing list