[cifs-protocol] 116011913598783 Active Directory server sort Unicode normalization
edgaro at microsoft.com
Tue Jan 19 18:08:29 UTC 2016
I'll review and follow-up.
From: Sreekanth Nadendla
Sent: Monday, January 18, 2016 7:21 PM
To: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Cc: MSSolve Case Email <casemail at microsoft.com>; cifs-protocol at lists.samba.org
Subject: 116011913598783 Active Directory server sort Unicode normalization
Casemail in Cc
Dochelp in Bcc
Thank you for your inquiry about Active Directory Specifications. We have created incident # 116011913598783 to investigate this issue. One of the Open specifications team member will contact you shortly.
Microsoft Windows Open specifications
From: Douglas Bagnall [mailto:douglas.bagnall at catalyst.net.nz]
Sent: Monday, January 18, 2016 4:05 PM
To: Interoperability Documentation Help <dochelp at microsoft.com>
Cc: cifs-protocol at lists.samba.org
Subject: Active Directory server sort Unicode normalization
I am trying to improve server sort for Samba. Is there anywhere a machine readable version of the file "Active Directory Sort Table v02.pdf" from http://www.microsoft.com/en-us/download/details.aspx?id=1175
and if so, is there a license for its use?
I'm thinking of something akin to the text files in http://www.microsoft.com/en-us/download/details.aspx?id=10921.
Alternatively, is one of these more or less the same as the PDF (which
*looks* to be the case)? And how are these licensed?
Or, is there somewhere a description of how to derive these tables from the Unicode documents?
Another question: does Active Directory have any way of sorting characters outside the basic multilingual plane? Following RFC4518, the character "🅌" ("SQUARED SD" https://codepoints.net/U+1F14C) would be NFKC normalized and sort equivalently to "SD", but I can't see how Windows would deal with that.
In general it looks like Windows *almost* follows the RFC, but retains a bit of low precedence information in the case weight field (distinguishing e.g. the plain digit 1 from the superscript digit ¹) that would be lost to strict NFKC normalization. And it stops at 0xFFFF. Is that a fair summary?
More information about the cifs-protocol