[jcifs] problem encoding
Christopher R. Hertel
crh at ubiqx.mn.org
Fri Jan 17 08:44:47 EST 2003
Okay, spent a few minutes on #samba-technical and got some answers on
this.
I was probably incorrect in connecting the earlier discussion to the
problem that Andrea reported.
That aside, there is still a problem.
- If the client and server negotiate Unicode then everything works just
fine.
- If either side is unable to handle Unicode, then they both must be using
the same 8-bit encoding (same DOS OEM codepage) or anything above ASCII
127 is at risk for being mapped incorrectly.
This is, in fact, a problem for Samba 2.2.x. Full Unicode support is in
3.x, but not in 2.2.x.
Note that this is also an SMB protocol bug, not a client or server bug.
There is nothing in SMB that allows negotiation of the codepage, which is
a major oversight. I guess they never figured that people from different
nationalities might want to communicate.
One more interesting note: UTF-8 is a multi-byte encoding. ASCII values
(codes 127 and below) are stored in one byte. Anything above is stored in
two bytes, with the high-order bit set. I am told that UTF-8 is *not*
used in SMB at all.
Chris -)-----
--
Samba Team -- http://www.samba.org/ -)----- Christopher R. Hertel
jCIFS Team -- http://jcifs.samba.org/ -)----- ubiqx development, uninq.
ubiqx Team -- http://www.ubiqx.org/ -)----- crh at ubiqx.mn.org
OnLineBook -- http://ubiqx.org/cifs/ -)----- crh at ubiqx.org
More information about the jcifs
mailing list