i18n question.

Shiro Yamada shiro at miraclelinux.com
Wed Mar 3 09:28:23 GMT 2004


Hello there,

Jeremy Allison wrote:

> On Wed, Mar 03, 2004 at 10:50:58AM +0900, Kenichi Okuyama wrote:
> 
>>Dear Jeremy,
>>
>># Have you set 'Reply-To:'?
>>
>>
>>
>>>>>>>"JA" == Jeremy Allison <jra at samba.org> writes:
>>>>
>>>>He covered CP932 (Japanese), UHC (Korean), GB18030 (Simplified
>>>>Chinese) and Big5 (Traditional Chinese).  He says, that "/" is not
>>>>used by any of these, but "\" is used as a trail byte in CP932,
>>>>GB18030 and Big5.
>>
>>JA> Oh well, these character sets are going to be quite slow then :-(.
>>JA> I'll add the code without the special case for the broken char
>>JA> sets and then fix it up afterwards.
>>
>>Easiest way to solve that problem is to use UCS2, or UTF8
>>as internal character coding.
> 
> 
> Actually, I managed to make even the "slow" case not so bad,
> be using the property that when parsing pathnames (where we
> only care about the characters '.', '\\' and '/' that the '\\'
> character is the only one that can occur as the second part
> of a mb-encoded string).
> 
> I'm testing the code to make sure I have the correct semantics
> with sb-encoding right now, I'd appreciate some help once
> I've checked it in to make sure it works with the problematic
> mb encodings.

We have developed an automated MB testing environment, so I think
we can help you on this area. And, can I mention my idea about
Samba MB features here?

Preparing two separate paths, one with fast routine and the other
with slow one, does not have good influences on Samba. Although it
may satisfy the neeed of using MB characters, it will sacrifise
the leanness of Samba, making it prone to bugs.

On the other hand, we want support for these special characters for
completeness.

Considering these two rather conflicting factors, the only sensible
way to solve these issues is to unify the Samba's internal character
code into unicode (UCS2) whatever a user specified as `unix charset',
as Kenichi Okuyama has suggested. If we do that, not only it will
improve the performance of Samba MB capabilities but also the
maintainability of Samba, as we won't need to prepare two identical
functions one for asciis and the other for MB strings. All the string
comparisons, standardisations and replacements would be done based
on unicode.

Having said that, I am aware that it will lead to the major
reconstruction of source code. It may not be possible under the
current Samba 3.0, but I would like to see it happens in the future.

Anyhow, we'll do the test for you, please let us know when its ready.

Regards,
--
Shiro Yamada
shiro at miraclelinux.com




More information about the samba-technical mailing list