[cifs-protocol] Active Directory server sort Unicode normalization

Douglas Bagnall douglas.bagnall at catalyst.net.nz
Tue Jan 19 00:04:36 UTC 2016


hi Dochelp,

I am trying to improve server sort for Samba. Is there anywhere a
machine readable version of the file "Active Directory Sort Table
v02.pdf" from http://www.microsoft.com/en-us/download/details.aspx?id=1175
and if so, is there a license for its use?

I'm thinking of something akin to the text files in
http://www.microsoft.com/en-us/download/details.aspx?id=10921.
Alternatively, is one of these more or less the same as the PDF (which
*looks* to be the case)? And how are these licensed?

Or, is there somewhere a description of how to derive these tables
from the Unicode documents?

Another question: does Active Directory have any way of sorting
characters outside the basic multilingual plane? Following RFC4518,
the character "๐Ÿ…Œ" ("SQUARED SD" https://codepoints.net/U+1F14C) would
be NFKC normalized and sort equivalently to "SD", but I can't see how
Windows would deal with that.

In general it looks like Windows *almost* follows the RFC, but retains
a bit of low precedence information in the case weight field
(distinguishing e.g. the plain digit 1 from the superscript digit ยน)
that would be lost to strict NFKC normalization. And it stops at
0xFFFF. Is that a fair summary?

thanks,
Douglas



More information about the cifs-protocol mailing list