[jcifs] Russian characters.

Michael B Allen mballen at erols.com
Sat Feb 16 07:54:26 EST 2002


On Fri, 15 Feb 2002 11:14:57 -0600
Christopher R.Hertel <crh at ubiqx.mn.org> wrote:

> This is probably a configuration issue.  jCIFS uses the Unicode support
> built into Java.  I know that there is some information in the SMB
> protocol that is supposed to indicate which character set is being used,
> but you would also need to make that character set available to jCIFS.
> 
> Anyone else on the list have more information on using Unicode?

There's a flag in one of the initial SMBs that indicates wheather or not
Unicode should be used as opposed to ASCII. But in reality the ASCII does
not have to be 7bit ASCII. It can be any 8 bit encoding provided both
the client and server are both configured to use that encoding. I think
that's how it works. For example on Samba you can compile it to use say
ISO-8859-5 (Cryllic) (i.e. Russian) (actually I don't know if you really
can compile Samba with this *particlar* character set). Now if the client
speaks ISO-8859-5 the "ASCII" is really ISO-8859-5 and everyone's happy.

The jCIFS client does both "ASCII" mode and Unicode. If your server
negotiates Unicode then use that. If the server only negotiates say
ISO-8859-5 then the host JVM must be configured to use this encdoing. This
is likely controlled using the C local mechanism but I think Java controls
this through System properties. If you dump your System properties
(-Dlog=ALL does this in the beginning) you'll see:

...
user.language=en
user.region=US
user.timezone=America/Denver
...

These are probaby important properties for controlling the "ASCII"
character set in jCIFS. Of course the host machine is likely already
configured to use the proper character set or other software would do
things as strange as replacing characters with question marks. Again,
just use Unicode if you can.

Now after have said all of this. I don't think it has anything to do
with the OPs question. The question marks are a substitution character
when an encoding is translated and cannot be represented in a lesser
encoding or if *the font used does not have glyphs for those characters*.

Mike

-- 
May The Source be with you.




More information about the jcifs mailing list