[jcifs] Character Set discussions

Christopher R. Hertel crh at ubiqx.mn.org
Wed Feb 5 13:36:47 EST 2003

On Tue, Feb 04, 2003 at 08:17:48PM -0500, Allen, Michael B (RSCH) wrote:
> > > odd character but for regularly occuring Unicode it's just insanity.
> > 
> > Why?  The user only ever sees it if they need to escape something, which
> > (since they are using Unicode) would only happen if the character is a
> > reserved character within the ASCII set.
> > 
> For non-Latin users I think this would happen pretty frequently although I
> don't know how they're getting around the problem right now. They are
> probably restricting themselves to just using ASCII. But that's not an
> option for SMB URLs.

I would think that non-Latin users would have keyboards and software that 
allow them to enter the Unicode characters they need.

> > Also, the input encoding doesn't matter as long as the underlying system
> > knows what it is.  It could be UCS-2LE, for instance, and as long as the
> > terminal or browser knows that it can convert as necessary.
> > 
> Right. The underlying system will just use Unicode. If it's wchar_t then
> it's probably UCS codes. The program will probably never know what the
> actual "encoding" is. It's just numbers in memory. Or perhaps you're
> thinking of the way it's encoded in the computers memory. Not matter.

Right.  It shouldn't matter.  We're just going to get a string of bytes.

> But there are still the problem of representing these URLs in files
> like within an HTML document or configuration file. It's a little
> optimistic to think these serialized forms will all be in one Unicode
> encoding or another.

There is *supposed* to be a header declaring the encoding of the file (if
it's in HTML, for example).  It will, as you suggest, take the Latin world
a while to get used to this.

> > Absolutely not.  That's why then need to be able to enter it as Unicode
> > text, not as escapes.
> > 
> Again, same issue of serialization. But if everyone displays Unicode then
> we'd be ok. So far Linux isn't up to the task. Actually Red Hat 8 uses a
> UTF-8 locale now by default so I suppose that is changing. The big
> question is the browsers. It all hinges on what they support.

Yes, probably.  With China going Linux, though, I think we'll see a lot
more emphasis on Internationalization.  Also a lot more emphasis on IPv6.
Hmmm... that latter point would mean that we really need to work on port 
445 stuff.  NBT doesn't work over IPv6, but naked transport does.

> > > Whohoo! Allright! Here we go .....
> > 
> > I'm not sure what to make of that, Mike.  :)
> > 
> 	Just kidding.

Ah.  Now I get it.  No, I don't.  Wait... let me catch the bus...

Chridz -)-----

Samba Team -- http://www.samba.org/     -)-----   Christopher R. Hertel
jCIFS Team -- http://jcifs.samba.org/   -)-----   ubiqx development, uninq.
ubiqx Team -- http://www.ubiqx.org/     -)-----   crh at ubiqx.mn.org
OnLineBook -- http://ubiqx.org/cifs/    -)-----   crh at ubiqx.org

More information about the jcifs mailing list