[jcifs] Character Set discussions

Michael B. Allen miallen at eskimo.com
Mon Feb 10 06:37:07 EST 2003


On Sun, 09 Feb 2003 06:09:23 -0500
Eric <eglass1 at attbi.com> wrote:

> > What I'm concerned will happen is that an escape sequence like %C5%A5
> > will be converted into the Unicode characters U+00C5 followed by the
> > character U+00A5 rather than being converted to the single character
> > U+0165 as we intended.
> 
> isn't a method to UNescape the entire URI and return it.  There are 
> methods to access the different components in this fashion, however, and 
> they do interpret %HH%HHs as UTF-8 characters; you would do
> 
> String str = uri.getScheme() + ":" + uri.getSchemeSpecificPart();
> if (uri.getFragment() != null) {
>      str += "#" + uri.getFragment();
> }
> 
> Which will give you the input URI with all %HH%HHs unescaped and decoded 
> as UTF-8 -- basically, a Java string with the Unicode characters.

Ok. So it works. I'm a little surprised but I'm glad I was wrong. However
now I wonder if this behavior is locale depedant. Meaning if you do the
same thing in a Latin1 locale the escapes *are* interpreted as individual
characters rather than a UTF-8 sequence. They should be and I suspect
they will because that's trivial by comparison. In theory I suppose
this can work provided the escaping takes the locale and surrounding
characters into consideration. But I bet that was hairy peice of code. I
don't think I'm up to reproducing this in the smb Handler. Perhaps we
can backport the URI class.

> Whether you can do a System.out.println(str) successfully would depend 
> on console support, as you noted; obviously, the ability to output the 
> character is limited by the ability of the console to represent it. 
> Since I am able to do so, it looks fine on my screen.

Right. We are only concerned with how a Unicode string is handled
internally. Getting one from the colsole or displaying one correctly on
the console is a totally separable thing.

Mike

-- 
A  program should be written to model the concepts of the task it
performs rather than the physical world or a process because this
maximizes  the  potential  for it to be applied to tasks that are
conceptually  similar and, more important, to tasks that have not
yet been conceived. 


More information about the jcifs mailing list