Mac OS X - compilation experiences and issues

Thu Sep 11 15:15:02 GMT 2003

Hi everybody,

I noticed some time ago that Samba 2 which is provided by Apple in Mac
OS X 10.2 doesn't support Unicode.  I compiled 3.0.0beta1 (that was
some time ago) and got it to work.  Now I have tried to update to the
current version and met a problem, that I would like to bring to your
attention.

At the same time I would like to offer my Mac OS X patches and
additions if they are wanted.  I think it would be a good thing, if
Samba would compile on Mac OS X OOTB, so that people who want to get
the most recent version do not have to depend on Apple.

Besides minor tweaks to the configure.in these are the main issues
that I encountered:

Native Kerberos support
-----------------------

I haven't been able to get the kerberos support compiled with beta1.
I don't need ADS, so I configured --without-ads.  I have not yet tried
this with more current code.  I don't understand the issues very well,
much less the library APIs used, so I don't plan to do anything about
this.

Filesystem encoding
-------------------

Mac OS X filesystem APIs use decomposed UTF-8.  This presents two
problems.  The first one being that a composing/decomposing charset
module is needed, because the SMB clients (W2K in my experiments)
assume pre-composed characters and do not work well with decomposed
characters.  I implemented a simple charset module on the basis of the
CFString APIs (CoreFoundation).

The second problem is the fast-path code that was implemented between
beta3 and rc3 (two versions that I have tried).  It assumes (in the
words of lib/charcnv.c):

 All multibyte sequences must start with a byte with the high bit set.

This condition just doesn't hold for decomposed Unicode.  "Multibyte
sequences" (in the sense of byte sequences that are converted/changed
by the charset module) often do start with a regular ASCII character
followed by a diacritic.  E.g. the Unicode sequence U+0041 U+0308
(encoded by Mac OS X as UTF-8, but still beginning with \x41 == 'A')
is converted by the charset module into U+00C4.

Due to the fast-path code, the first such decomposed character is not
composed, which causes severe problems for Windows.

I think there are several solutions possible:

a) Disable the fast-path code completely based on a #define/#ifdef.  I
   plan to do measurements, to see how much of an effect the code has
   in the first place.

b) Disable the fast-path code based on a #define/#ifdef, but only in
   the few functions where that is really needed.  This is based on
   the assumption that some or even most of the functions in question
   are used only internally and only some are or even only one are
   involved in the actual creation of the bytes sent out on the wire.
   This has the risk that we get a dissonance, some functions convert
   this way, and some functions convert differently.  This may lead to
   bugs in code that isn't explicitly tested (I immediately think of
   wildcard expansion, but there may be other code) or it may cause
   future bugs when functions are used differently than they are now.

c) Change the fast-path code to back up one character between the
   ASCII version and the non-ASCII version, possible wrapped in an
   #ifdef again, to enable this only when needed.

Let me know what you think, and in which direction you want to go.

benny