[jcifs] problem encoding

Christopher R. Hertel crh at ubiqx.mn.org
Fri Jan 17 08:44:47 EST 2003


Okay, spent a few minutes on #samba-technical and got some answers on 
this.

I was probably incorrect in connecting the earlier discussion to the
problem that Andrea reported.

That aside, there is still a problem.
- If the client and server negotiate Unicode then everything works just 
  fine.

- If either side is unable to handle Unicode, then they both must be using
  the same 8-bit encoding (same DOS OEM codepage) or anything above ASCII
  127 is at risk for being mapped incorrectly.

This is, in fact, a problem for Samba 2.2.x.  Full Unicode support is in
3.x, but not in 2.2.x.

Note that this is also an SMB protocol bug, not a client or server bug.  
There is nothing in SMB that allows negotiation of the codepage, which is 
a major oversight.  I guess they never figured that people from different 
nationalities might want to communicate.

One more interesting note:  UTF-8 is a multi-byte encoding.  ASCII values 
(codes 127 and below) are stored in one byte.  Anything above is stored in 
two bytes, with the high-order bit set.  I am told that UTF-8 is *not* 
used in SMB at all.

Chris -)-----

-- 
Samba Team -- http://www.samba.org/     -)-----   Christopher R. Hertel
jCIFS Team -- http://jcifs.samba.org/   -)-----   ubiqx development, uninq.
ubiqx Team -- http://www.ubiqx.org/     -)-----   crh at ubiqx.mn.org
OnLineBook -- http://ubiqx.org/cifs/    -)-----   crh at ubiqx.org



More information about the jcifs mailing list