[cifs-protocol] 116011913598783 Active Directory server sort Unicode normalization

Tue Jan 19 18:08:29 UTC 2016

Douglas,
I'll review and follow-up.

Thanks,
Edgar

-----Original Message-----
From: Sreekanth Nadendla 
Sent: Monday, January 18, 2016 7:21 PM
To: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Cc: MSSolve Case Email <casemail at microsoft.com>; cifs-protocol at lists.samba.org
Subject: 116011913598783 Active Directory server sort Unicode normalization

Casemail in Cc
Dochelp in Bcc

Hello Douglas,
                          Thank you for your inquiry about Active Directory Specifications. We have created incident # 116011913598783 to investigate this issue. One of the Open specifications team member will contact you shortly.

Regards,
Sreekanth Nadendla
Microsoft Windows Open specifications

-----Original Message-----
From: Douglas Bagnall [mailto:douglas.bagnall at catalyst.net.nz] 
Sent: Monday, January 18, 2016 4:05 PM
To: Interoperability Documentation Help <dochelp at microsoft.com>
Cc: cifs-protocol at lists.samba.org
Subject: Active Directory server sort Unicode normalization

hi Dochelp,

I am trying to improve server sort for Samba. Is there anywhere a machine readable version of the file "Active Directory Sort Table v02.pdf" from http://www.microsoft.com/en-us/download/details.aspx?id=1175
and if so, is there a license for its use?

I'm thinking of something akin to the text files in http://www.microsoft.com/en-us/download/details.aspx?id=10921.
Alternatively, is one of these more or less the same as the PDF (which
*looks* to be the case)? And how are these licensed?

Or, is there somewhere a description of how to derive these tables from the Unicode documents?

Another question: does Active Directory have any way of sorting characters outside the basic multilingual plane? Following RFC4518, the character "🅌" ("SQUARED SD" https://codepoints.net/U+1F14C) would be NFKC normalized and sort equivalently to "SD", but I can't see how Windows would deal with that.

In general it looks like Windows *almost* follows the RFC, but retains a bit of low precedence information in the case weight field (distinguishing e.g. the plain digit 1 from the superscript digit ¹) that would be lost to strict NFKC normalization. And it stops at 0xFFFF. Is that a fair summary?

thanks,
Douglas