[jcifs] Character Set discussions

Eric eglass1 at attbi.com
Sun Feb 9 22:09:23 EST 2003

> Yes, that terminal is obviously in a UTF-8 locale. Did you set that
> up or are you using UTF-8 as the default locale? Is it Red Hat 8.0? RH8
> uses the UTF-8 locale by default.
> Doesn't matter. Now that we have established you can properly display
> Unicode Strings here is the $24,000 question:
> If you display these URIs unescaped (not clear to me how you do that)
> what do they look like? Are the characters properly converted? How about
> in the URL with both escaped and unescaped characters?
> What I'm concerned will happen is that an escape sequence like %C5%A5
> will be converted into the Unicode characters U+00C5 followed by the
> character U+00A5 rather than being converted to the single character
> U+0165 as we intended.
> Mike

Okay, I think I see where you're coming from (Red Hat 8 is on this box, 

You're asking, if I enter a URL like:


(with a mixture of escaped and unescaped) or even just


(all escaped), how does it interpret the %HH%HHs -- as a single UTF-8 
encoded char or as 2 separate characters.

The URI.toString() method returns the raw URL as entered by the user -- 
in the first case above, a mix of escaped and unescaped.  The 
URI.toASCIIString() method escapes all the non-ASCII characters.  There 
isn't a method to UNescape the entire URI and return it.  There are 
methods to access the different components in this fashion, however, and 
they do interpret %HH%HHs as UTF-8 characters; you would do

String str = uri.getScheme() + ":" + uri.getSchemeSpecificPart();
if (uri.getFragment() != null) {
     str += "#" + uri.getFragment();

Which will give you the input URI with all %HH%HHs unescaped and decoded 
as UTF-8 -- basically, a Java string with the Unicode characters.

Whether you can do a System.out.println(str) successfully would depend 
on console support, as you noted; obviously, the ability to output the 
character is limited by the ability of the console to represent it. 
Since I am able to do so, it looks fine on my screen.


More information about the jcifs mailing list