Managing DNs in libads only in utf8

Tue Feb 27 07:57:45 GMT 2007

Jeremy,

 > Right now it's simple - internal -> onto wire means
 > convert from unix -> utf8, wire -> internal means
 > convert utf8 -> unix. If you blur that boundary
 > I think you will break more than you fix.

yep, I agree. That simple rule has served us well and I'd like to see
a much more detailed justification for changing away from this than
what Simo has given so far. 

Could we instead just ensure we emit a prominent warning when the
conversion routines hit a character they can't handle? Maybe something
like:

 WARNING: you tried to convert a string from UTF8 to ASCII that cannot
 be represented in ASCII. We strongly suggest you check your "unix
 charset" setting, and ensure it can handle all characters used in
 your system.

too verbose maybe?

Meanwhile, way out in left field, I did mention to Simo on IRC an idea
of how we could handle this in another way. Our core problem is that
we can't tell what charset a "char*" is. Well, with talloc we could
tell, by making the talloc "type" of the string be the charset. At the
moment we make the "type" be the contents of the string, just because
there wasn't anything else useful to put in there.

I'm not actually sure I like this idea, even though I'm proposing
it. The upside is that our string routines could always tell what
charset a string is in. The downside is a rather intimate relationship
between strings and talloc, plus problems with strings that don't
start on a talloc boundary, and ensuring that all strings have the
charset set.

The other big downside is that it moves us more towards "C as an
interpreted language" via non-standard, magic constructs. Maybe that's
enough to kill the idea in itself.

Cheers, Tridge