Managing DNs in libads only in utf8

simo idra at samba.org
Tue Feb 27 13:50:09 GMT 2007


On Tue, 2007-02-27 at 18:57 +1100, tridge at samba.org wrote:
> Jeremy,
> 
>  > Right now it's simple - internal -> onto wire means
>  > convert from unix -> utf8, wire -> internal means
>  > convert utf8 -> unix. If you blur that boundary
>  > I think you will break more than you fix.
> 
> yep, I agree. That simple rule has served us well and I'd like to see
> a much more detailed justification for changing away from this than
> what Simo has given so far. 

See again my original mail, and the answer to Jeremy. I am fixing a bug
caused by unnecessary conversions. I think we can safely shift the
boundary in this case as DNs are very well defined and limited to
libads.

I agree and supported the decision of how the boundaries were created
(you seem to always forget we 2 have had a lot of conversations about
this problem face to face at a CIFS conference and at a sambaXP
conference).
But I am not proposing a big boundary shift, seem that here everyone is
just going in to panic mode without looking at the patch, and my
explanation of what I am changing and that I want to keep this confined.

Sure I will look into shifting the boundary some more probably, but as I
said, each time I will propose a clean patch that show the boundary can
be safely moved without leaking anything in other code, but this is not
the case, forget any major code change, I am being very specific and the
patch is actually needed to fix a bug (3675) that was seen in the wild.

I am not proposing this right now, but I think we should think of making
the whole winbindd code utf8 internally, winbindd do not deal with files
it should be sane to do so, communication with winbindd would be
considered the boundary and conversions would happen on its boundaries
(ie smbd, nss_winbindd, pam_winbindd, calls to winbindd: anything that
goes into the winbindd pipe). But please do not take this point as the
culprit, the patch is about a specific bug, please comment that, we can
leave this discussion to another thread.

> Could we instead just ensure we emit a prominent warning when the
> conversion routines hit a character they can't handle? Maybe something
> like:
> 
>  WARNING: you tried to convert a string from UTF8 to ASCII that cannot
>  be represented in ASCII. We strongly suggest you check your "unix
>  charset" setting, and ensure it can handle all characters used in
>  your system.
> 
> too verbose maybe?

So you prefer not to fixing a group membership bug in winbindd and
instead constantly throw this error? You know often samba admins has no
leverage against the directory admins, you will just leave them with yet
another (easily fixable) inconsistency plus a lot of noise in the logs.

> Meanwhile, way out in left field, I did mention to Simo on IRC an idea
> of how we could handle this in another way. Our core problem is that
> we can't tell what charset a "char*" is. Well, with talloc we could
> tell, by making the talloc "type" of the string be the charset. At the
> moment we make the "type" be the contents of the string, just because
> there wasn't anything else useful to put in there.
> 
> I'm not actually sure I like this idea, even though I'm proposing
> it. The upside is that our string routines could always tell what
> charset a string is in. The downside is a rather intimate relationship
> between strings and talloc, plus problems with strings that don't
> start on a talloc boundary, and ensuring that all strings have the
> charset set.

No, this would mean we have to change all our string manipulation
functions (the one that deal with case at least), to check the type and
act accordingly. But as you say we would have the problem of strings
that start from a non talloc boundary, we have tons of them, we would
need to increase considerably the allocation rate just for that. I find
this way too insane.

> The other big downside is that it moves us more towards "C as an
> interpreted language" via non-standard, magic constructs. Maybe that's
> enough to kill the idea in itself.

It's not that, it has to do with the amount of code to change and coding
habbits, this change would be much more problematic than just shifting
the boundary from the wire to syscalls.

Instead to help people understand DNs as special string I'd like to
introduce something like this to define them in function prototypes and
in the code:

#define smb_utf8_t char

This way a user of the libads code, will see immediately and without
doubt that DNs are utf8 strings. I know this is just advisory, but we do
not play with DNs so much to worry about their nature so strongly.

Please lets not get prisoners of our own conventions. I agree we must be
careful about charset boundaries, but I am confident this case is ok and
it is the right thing to do. It fixes a real bug, and there is no chance
it leaks utf8 strings in other parts of the code, I carefully checked
that. Reject on the merit, not in principle if you have to.

Simo.

-- 
Simo Sorce
Samba Team GPL Compliance Officer
email: idra at samba.org
http://samba.org



More information about the samba-technical mailing list