utf8 vs ucs2

NEZU, Kensuke nez at samba.gr.jp
Wed May 23 12:11:42 GMT 2001


Andrew Tridgell wrote:
> 
> > typedef struct sambastring_tag
> > {
> >       UINT16* buffer;
> >       int length;
> >       ...
> > } sambastring;
> 
> yes, this is similar to what Tim proposed a while back. It would be a
> good thing but it does require a *lot* of code rewriting. The
> pstring/fstring stuff is stack allocated, and changing to explicit
> allocation would require a lot of thought.

In addition to Ryo's already proposed, I would also propose to take
these strategy at the least with new writing code for future release.
In your plan, Andrew, I think there will be still remain translation
issues between 'internal' and 'os' string.
It also does with migration step of your plan with translation between
'wire' UCS2 and 'internal' UTF-8.
The problem sould be occure the case if it is different with their lengths
between before translation and after translation. In these cases, each
translation function should treat *indivisually* about buffer overwriting
problem. It's very hard work, and easilly en-bug(s).
It seems not to smart. And, It seems too difficult to pip away these logic
from function especially its translation with 'unfamilier language/coding
system' when you should change from pstring/fstring stuff to new stuff.

To avoid these bad situation, I would like to recommend to take wrapper
functions such looks-alike as follows:

#ifdef ASSUME_MULTIBYTE_HAS_DIFFERENT_LENGTH
typedef struct sambastring_tag { ... } sambastring;
#define samba_pstring(pstr, str) sambastring_to_pstring((pstring *)pstr, \
	(sambastring *)str)
#define samba_sambastring(str, pstr) sambastring_from_pstring((sambastring
*)str,
	(pstring *)pstr)
#else
typedef pstring sambastring;
#define get_sambastring_buffer(any) /* */
#define samba_pstring(pstr, str) ((pstring *)pstr)
#define samba_sambastring(str, pstr) ((pstring *)pstr)
#endif

in any function:...

sambastring str1,str2;
pstring pstr[];

get_sambastring_buffer(&str1); get_sambastring_buffer(&str2);
any_old_function(
  samba_pstring(pstr, new_function(samba_sambastring(str1, pstr), str2))
)

It has four advantages:
First, we don't have to worry about as if new function safe with new
future string structure.
Second, these wrapper become glue. Thus, inclemental conversion will avalable.
Third, if all conversion finished, it will not have to touch time-tested
code works. We only have to remove these macros to disable; define to /* */.:-)
At last, people who lives without needs concerning about such things has no
disadvantage with this design.

> The big step for me was realising that we don't have to do this string
> structure change at the same time as the change to ucs2/utf8. So we
> can keep going with the old pstring/fstring stuff until we have the
> string formats sorted out, then deal with the allocation and string
> structure problem later. That makes the problem much more tractable.

OK, you are reasonable enough. "One thing" makes the world simple.
But, it is also chance to begin.
Only I would say, please mind there are people have been and will be
encounter this problem in their real life. 

--
----------
Kensuke Nezu, nez at samba.gr.jp
Auditor , Samba Users Group in Japan




More information about the samba-technical mailing list