i18n question.
tridge at samba.org
tridge at samba.org
Thu Mar 18 00:02:30 GMT 2004
Monyo,
> But actually some of Japanese characters(scripts) are case
> sensitive. For example U+FF21 (Fullwidth Latin Capital Letter A) and
> U+FF41 (Fullwidth Latin Small Letter A).
Interesting. Is this rare? If you have 1000 filenames on a filesystem
in Japanese how many of them would contain characters like this that
are case sensitive?
> This idea can be still useful, but for Japanese we cannot simply
> assume that non-ASCII chars are case insensitive.
That's fine, it just means that caseless_index() function needs to be
a bit more complex. I suspect it will still be a big win.
> |1) there are only 8 possible case combinations for a 3 letter
> | extension.
> |We could call stat() on all 8, and avoid the directory
>
> , first assuming all the 3 letters are lowercase and second are
> uppercase, most of extensions would be matched in those 2 cases.
That's not how it works. If the filename does exist then 99% of the
time we will find it on the first stat() call, either through a guess
or via the "stat cache" code.
The interesting case is where the file doesn't exist, and that is the
case that I am trying to improve with this scheme. About half the time
when a windows client tries to open a file the filename does not
exist. The problem is proving with absolute certainty that it doesn't
exist. In English that means scanning the directory.
I hope that with this scheme we can avoid the scan even for files that
do not exist, as long as the filename uses caseless characters. I am
hoping that will be common enough in Japanese and Chinese to be
worthwhile.
> |While I am here, I would like help from someone to convert a NBENCH
> |load file from English characters to Japanese or Chinese. That will
> |give us a benchmark to use for speed comparisons.
>
> Yasuma, How about this?
See my separate reply about this.
Cheers, Tridge
More information about the samba-technical
mailing list