[jcifs] Character Encioding Problems on OS/390
Eric Glass
eglass1 at attbi.com
Fri Oct 25 10:03:31 EST 2002
On Thu, 2002-10-24 at 18:01, Allen, Michael B (RSCH) wrote:
>
>
> >
> Or you mean use char[] for all array work and then at the last minute create a
> String from it and do getBytes( "ISO-8859-1" ). I still don't understand were the
> UTF-8 comes in though. Also you sound like you know of all of the locations in
> the code were these changes would need to occur but almost all operations
> are byte oriented. Can you give me a few example locations?
>
The UTFs (Unicode Transformation Formats) are means of representing
Unicode characters; in UTF-8's case, 0-255 are represented as a single
byte (same as ISO-8859-1). For characters above 255, UTF-8 is a
multibyte representation. I believe ISO-8859-1 is incapable of
representing 256+.
A good explanation of the various character sets can be found here:
http://www.czyborra.com/utf/
http://czyborra.com/charsets/iso8859.html
as well as a brief discussion of EBCDIC, the issue at hand:
http://czyborra.com/charsets/iso646.html
As far as jCIFS is concerned, it probably doesn't matter which encoding
you use; a String containing characters over 255 would be encoded as
multiple bytes using UTF-8, which (I'm guessing) would be meaningless to
jCIFS. Characters over 255 can't be represented using ISO-8859-1, and
the behavior in this case is unspecified (according to the String
Javadocs). So either way, you'll probably get garbage with any input
characters over 255, which isn't really an issue unless the underlying
network protocol has specified a means of handling it (in which case you
would use the specified encoding).
As far as the actual code changes required for jCIFS, the only places
that they should need to be applied would be at the point of conversion
between a String and a byte[]. The most common would be something like:
String myString = "hello there.";
byte[] myBytes = myString.getBytes();
which would just need to be changed to:
String myString = "hello there.";
byte[] myBytes = myString.getBytes("ISO-8859-1");
Another instance would be:
byte[] myBytes;
...
String myString = new String(myBytes);
which would be changed to:
byte[] myBytes;
...
String myString = new String(myBytes, "ISO-8859-1");
Eric
More information about the jcifs
mailing list