Unicode problem in 2.0.7-pre1

Hiroshi MIURA miura at blue.gr.jp
Sun Feb 27 09:02:13 GMT 2000


Hello, 

I find the bug in utils/make_unimap.c in samba-2.0.7-pre1.
It use codepages/CP???.TXT  that come from Unicode.org.
You can see Codepage to Unicode mapping in this table.
But you cannot determine mapping from Unicode to Codepage in some case, 
because in CP932, there are some "1:n" map from unicode to CP. 
How do you judge which one you should adopt?

I made a Unicode to CP932 map table, "CP932UC.txt". and made a code 
that handle this table, "make_unimap2.c" and some patch to "installcp.sh"

and.. make L10N patch to "lib/kanji.c", that fix problem 
described before.  

there are in 
http://plaza24.mbn.or.jp/~moh/samba/samba-pre2.0.7-JP.patch

Please fix samba-2.0.7-pre1....

MIURA, SAMBA-JP staff

In message "Japanese SJIS reguration problem"
    on 00/02/10, Hiroshi MIURA <miura at blue.gr.jp> writes:
> Hello,
> 
> I have problems, and make patch. 
> Now We,  the technical team on SAMBA user community in Japan, 
> test this patch for portability. 
> 
> The patch is attached. It is for samba-2.0.5.
> 
> problems are described belows.
> 
> =One Problem is...
> 
> In historical reason, we  have  a problem about SJIS code reguration.
> What's it? Long ago, each computer maker in Japan defines 
> their extention of KANJI codes adding to JIS(Japanese Industory
> Standard). In ShiftJIS, MS-Kanji, that is not the exception. 
> 
> The important extensions are
> 'NEC kanji', 'NEC selected IBM extention kanji code', 
> 'IBM extention kanji code'.
> 
> Bacause MSKK, Microsoft japan, adopt that 3 extension to Windows 3.0J, 
> Windows Code Set.
> But, there have duplicated codes, same typeface and differ code.
> 
> MS NT4(janapanese) unify these codes at one way, but
> Windows 98 and newer MS OS's, its unify codes at another way :-(
> 
> For example, 0x8754 in SJIS is 'one' in Roman number, looks like 'I'.
> NT4 use this code '0x8754', but Windows98 use '0xfa4a'.  
> these two code have same looks as  'I'. 
> 
> eg. 
> 
> 1) I make file 'I' (code is 0x8754) on the samba file server 
>    using NT4 workstation. 
> 2) I want to open file 'I' on Windows 98.
> 3) Windows 98 unify it to code 0xfa4a.
> 4) Samba ordered to open file named '0xfa4a' from Windows98.
> 5) Samba don't have it. samba has 'I' as 0x8754.
> 6) As a result, it fails. 
> 
> 
> =Another problem 
> 
> There are SJIS codes that we cannnot map to EUC.
> these code is extension described above. This patch make
> these code to unify defined code area.
> But this rule is different from MS's rule.
> 
> =Solution 
>   
> On coding system = CAP or HEX or SJIS, we unify the these code in MS's 
> recommended way.
> 
> On coding system = EUC or JIS,  we unify the some codes in an 
> original way.
> 
> Thanks,
> 
 MIURA, Hiroshi








More information about the samba-technical mailing list