Unicode 3.0 handling in Mac OS X and Samba.

Ryo Kawahara rkawa at lbe.co.jp
Mon May 7 11:45:31 GMT 2001


Hello, everyone.

we can't open files with "composite characters"
for filenames in Samba share on Mac OS X
with coding system = utf8.

Some members in sugj-tech at samba.gr.jp list tested 
samba-2.0.7-ja-2.2 ( the japanese version that is based on
 samba-2.0.7 and have same coding system=utf8
 functionality as samba-2.2) on Mac OS X and 
they reported it almost worked fine. But we can't open the file
when accessing files with "composite characters"
 such as a-acute or Japanese KATAKANA with trailing character
from Windows clients.

Mr.SHIRAI analized this problem and found that
UTF-8 locale in Mac OS X seems to use Unicode 3.0, and
"normalizes" Unicode string into Normalization Form D.
See following URL for more detail:

http://www.unicode.org/unicode/reports/tr15/

for example, 'a-acute' letter (0x00e1) is decomposed into
'a' (0x0061) and 'acute' (0x0301) letters in Mac OS X.
(actually, these characters are stored as UTF-8 in the filesystem.
 these character codes above are written in UCS-2 representation.)
i.e, 'a-acute' and 'a'+'acute' are identified in Mac OS X.

But since Windows NT (and Samba) do not seem to do "normalization",
they don't think 'a-acute' and 'a'+'acute' represent same character
and we can't access to those filenames from Windows clients.

Does anyone have good ideas to avoid this problem?
Or does Samba Team have any plan to implement the handling
of these characters?

///////////////////////////////////////////////////////////////
// Ryo Kawahara (rkawa at lbe.co.jp)
// website: http://www3.lbe.co.jp/~rkawa/
///////////////////////////////////////////////////////////////




More information about the samba-technical mailing list