[jcifs] Character Set discussions

Michael B. Allen miallen at eskimo.com
Sun Feb 9 08:12:03 EST 2003


On Sat, 08 Feb 2003 06:37:43 -0500
Eric <eglass1 at attbi.com> wrote:

> > Well how are you entering the strings? If you are entering them on
> > the console in a non-UTF-8 locale they will not be converted to
> > Unicode. Each byte in the UTF-8 sequence will just be treated as
> > an individual character.
> 
> Exactly -- any time you have a stream of BYTES representing a URI, it 
> needs to be escaped.  I'm assuming that if you are using Unicode 
> characters in your URI, you are using a character-based representation. 
>   If a user is going to enter a URI from a console which can't handle 
> Unicode characters, they would need to escape any non-ASCII chars.

It's not clear to me what you're saying about "streams of bytes". With
the console it is necessary to convert the UTF-8 arguments to get a proper
Java String type. I was just using getBytes() to illustrate what happens
if you didn't. I think it only served to confuse the real question which
I hope I have posed more clearly below.

> I attached a screen shot of what I am seeing (not sure if it will make 
> it through to the list)... is this correct?

Yes, that terminal is obviously in a UTF-8 locale. Did you set that
up or are you using UTF-8 as the default locale? Is it Red Hat 8.0? RH8
uses the UTF-8 locale by default.

Doesn't matter. Now that we have established you can properly display
Unicode Strings here is the $24,000 question:

If you display these URIs unescaped (not clear to me how you do that)
what do they look like? Are the characters properly converted? How about
in the URL with both escaped and unescaped characters?

What I'm concerned will happen is that an escape sequence like %C5%A5
will be converted into the Unicode characters U+00C5 followed by the
character U+00A5 rather than being converted to the single character
U+0165 as we intended.

Mike

-- 
A  program should be written to model the concepts of the task it
performs rather than the physical world or a process because this
maximizes  the  potential  for it to be applied to tasks that are
conceptually  similar and, more important, to tasks that have not
yet been conceived. 


More information about the jcifs mailing list