[jcifs] Character Set discussions
Eric
eglass1 at attbi.com
Sun Feb 9 22:09:23 EST 2003
>
> Yes, that terminal is obviously in a UTF-8 locale. Did you set that
> up or are you using UTF-8 as the default locale? Is it Red Hat 8.0? RH8
> uses the UTF-8 locale by default.
>
> Doesn't matter. Now that we have established you can properly display
> Unicode Strings here is the $24,000 question:
>
> If you display these URIs unescaped (not clear to me how you do that)
> what do they look like? Are the characters properly converted? How about
> in the URL with both escaped and unescaped characters?
>
> What I'm concerned will happen is that an escape sequence like %C5%A5
> will be converted into the Unicode characters U+00C5 followed by the
> character U+00A5 rather than being converted to the single character
> U+0165 as we intended.
>
> Mike
>
Okay, I think I see where you're coming from (Red Hat 8 is on this box,
incidentally).
You're asking, if I enter a URL like:
smb://svr/slovak/m%C3%B4em/jest(/sklo/nezran%C3%AD/ma.zip
(with a mixture of escaped and unescaped) or even just
smb://svr/slovak/m%C3%B4%C5%BEem/jes%C5%A5/sklo/nezran%C3%AD/ma.zip
(all escaped), how does it interpret the %HH%HHs -- as a single UTF-8
encoded char or as 2 separate characters.
The URI.toString() method returns the raw URL as entered by the user --
in the first case above, a mix of escaped and unescaped. The
URI.toASCIIString() method escapes all the non-ASCII characters. There
isn't a method to UNescape the entire URI and return it. There are
methods to access the different components in this fashion, however, and
they do interpret %HH%HHs as UTF-8 characters; you would do
String str = uri.getScheme() + ":" + uri.getSchemeSpecificPart();
if (uri.getFragment() != null) {
str += "#" + uri.getFragment();
}
Which will give you the input URI with all %HH%HHs unescaped and decoded
as UTF-8 -- basically, a Java string with the Unicode characters.
Whether you can do a System.out.println(str) successfully would depend
on console support, as you noted; obviously, the ability to output the
character is limited by the ability of the console to represent it.
Since I am able to do so, it looks fine on my screen.
Eric
More information about the jcifs
mailing list