i18n question.

Thu Mar 18 00:15:02 GMT 2004

Monyo,

 > |While I am here, I would like help from someone to convert a NBENCH
 > |load file from English characters to Japanese or Chinese. That will
 > |give us a benchmark to use for speed comparisons.
 > 
 > Yasuma, How about this?

I have created a file containing all of the english words used in a
NBENCH load file. The file is here:

  http://samba.org/ftp/tridge/dbench/nbench_words.txt

What we need is a file of the same length containing Japanese,
Chinese, Russian etc names that might be used as filenames. The words
do not have to be the same length as the English words, but we do need
the same number of words in total (ie. 244 words). The words need to
be written in UTF8, although you may find it easier to first write
them in some other character set then use /usr/bin/iconv to convert to
UTF8. 

I can then use sed on this file:

  http://samba.org/ftp/tridge/dbench/client_enterprise.txt

which is the load file for the new Samba4 NBENCH module. That will
produce a load file in each language. We can then use this load file
for benchmarking our improvements to the character set support in
Samba3 and Samba4.

I'd also like to introduce Frankie Chow (CCd on this email). Frankie
has volunteered to do this for Chinese.

It would be particularly good if the words lists contained some
"difficult" characters, for example characters that cannot be
represented in UCS-2, characters such as the wide-A that are not
caseless and characters that contain embedded '/' or '\' characters in
some character set. That will allow us to test the more complex code
paths.

I think this will be a good beginning towards a i18n testsuite for
Samba. We will probably use these words lists in other tests as well.

Cheers, Tridge