[jcifs] Character Set discussions

Allen, Michael B (RSCH) Michael_B_Allen at ml.com
Wed Feb 5 12:17:48 EST 2003



> -----Original Message-----
> From:	Christopher R. Hertel [SMTP:crh at ubiqx.mn.org]
> > > That's the missing piece.  Thanks.  I can dig into that now.
> > 
> > Not so fast speedy. I don't think the UTF-8 technique is intended to
> > address representing Unicode in HTTP URLs. It's fine for the occational
> 
> It's URLs in general, not just HTTP URLs.
> 
> > odd character but for regularly occuring Unicode it's just insanity.
> 
> Why?  The user only ever sees it if they need to escape something, which
> (since they are using Unicode) would only happen if the character is a
> reserved character within the ASCII set.
> 
	For non-Latin users I think this would happen pretty frequently although I don't know how
	they're getting around the problem right now. They are probably restricting themselves to
	just using ASCII. But that's not an option for SMB URLs.

> Also, the input encoding doesn't matter as long as the underlying system
> knows what it is.  It could be UCS-2LE, for instance, and as long as the
> terminal or browser knows that it can convert as necessary.
> 
	Right. The underlying system will just use Unicode. If it's wchar_t then it's probably UCS
	codes. The program will probably never know what the actual "encoding" is. It's just
	numbers in memory. Or perhaps you're thinking of the way it's encoded in the computers
	memory. Not matter. But there are still the problem of representing these URLs in files
	like within an HTML document or configuration file. It's a little optimistic to think these
	serialized forms will all be in one Unicode encoding or another.

> >   smb://svr/slovak/m%C3%B4%C5%BEem/jes%C5%A5/sklo/nezran%C3%AD/ma.zip
> 
> Right.
> 
> > That's pretty ugly. Think people would want to work with URLs like that
> > on a regular basis?
> 
> Absolutely not.  That's why then need to be able to enter it as Unicode 
> text, not as escapes.
> 
	Again, same issue of serialization. But if everyone displays Unicode then we'd be ok. So
	far Linux isn't up to the task. Actually Red Hat 8 uses a UTF-8 locale now by default so I
	suppose that is changing. The big question is the browsers. It all hinges on what they
	support.

> > Whohoo! Allright! Here we go .....
> 
> I'm not sure what to make of that, Mike.  :)
> 
	Just kidding.

	Mike



More information about the jcifs mailing list