[cifs-protocol] 116011913598783 Active Directory server sort Unicode normalization

Edgar Olougouna edgaro at microsoft.com
Thu Jan 28 17:00:28 UTC 2016


Douglas,

Active Directory Sorting Weight Table “Active directory sort table v-2.PDF” https://www.microsoft.com/en-us/download/details.aspx?id=1175
is obsolete. 
We have filed a technical document bug against MS-ADTS to remove the reference. In fact, there is no actual “Active Directory sorting table”; sorting weights are not really AD specific. Windows-based Active Directory relies just on what the operating system does.  The correct reference for Unicode comparison from AD is [MS-UCODEREF]. 

To get correct Unicode sorting, implementers need to be doing exactly the same thing that Windows Server does, as documented.

[MS-UCODEREF] is a reference document which describes how Unicode strings are compared in Windows protocols and how Windows supports Unicode conversion to earlier codepages. The [MS-UCODEREF] internal build has already been updated to reference files in the “Sorting Weight Tables” published recently on 1/19/2016. https://www.microsoft.com/en-us/download/details.aspx?id=10921

Per the patent map tool, [MS-UCODEREF] does not appear to require any license. 

The following resource may also help.
Using Unicode Normalization to Represent Strings
https://msdn.microsoft.com/en-us/library/windows/desktop/dd374126(v=vs.85).aspx

[MS-UCODEREF]: Windows Protocols Unicode Reference
3.1.5.2.3      Accessing the Windows Sorting Weight Table
. . .
3.1.5.2.3.1    Windows Sorting Weight Table
This section contains a link to detailed character weight specifications that permit consistent sorting and comparison of Unicode strings. The data is not used by itself but is used as one of the inputs to the comparison algorithm. The layout and format of data in this file is also specified in [MSDN-SWT].<3>
<3> Section 3.1.5.2.3.1:  The files in the download map to specific Windows versions as follows:
Version	File Name
Windows NT 4.0 operating system, Windows 2000, Windows XP, and Windows Server 2003	Windows NT 4.0 through Windows Server 2003 Sorting Weight Table.txt
Windows Vista	Windows Vista Sorting Weight Table.txt
Windows Server 2008 	Windows Server 2008 Sorting Weight Table.txt
Windows 7  and Windows Server 2008 R2 	Windows 7 and Windows Server 2008 R2 Sorting Weight Table.txt
Windows 8, Windows 8.1, Windows Server 2012, and Windows Server 2012 R2 	Windows 8 and Windows Server 2012 Sorting Weight Table.txt
Windows 8 Upper Case Mapping Table.txt
Windows 10 and Windows Server 2016 Technical Preview	Windows 10 Sorting Weight Table.txt

1        Introduction
This document is a companion reference to the protocol specifications. It describes how Unicode strings are compared in Windows protocols and how Windows supports Unicode conversion to earlier codepages. For example:
§	UTF-16 string comparison: Provides linguistic-specific comparisons between two Unicode strings and provides the comparison result based on the language and region for a specific user.
§	Mapping of UTF-16 strings to earlier ANSI codepages: Converts Unicode strings to strings in the earlier codepages that are used in older versions of Windows and the applications that are written for these earlier codepages.
1.3     Overview
This document describes the following protocols when dealing with Unicode strings on the Windows platform:
§	UTF-16 string comparison: This string comparison is used to provide a linguistic-specific comparison between two Unicode strings. This scenario provides a string comparison result based on the expectations of users from different languages and different regions.
§	The mapping of UTF-16 strings to earlier codepages: This scenario is used to convert between Unicode strings and strings in the earlier codepage, which are used by older versions of Windows and applications written for these earlier codepages.
1.4     Applicability Statement
This reference document is applicable as follows:
§	To perform UTF-16 character comparisons in the same manner as Windows. This document only specifies a subset of Windows behaviors that are used by other protocols. It does not document those Windows behaviors that are not used by other protocols.
§	To provide the capability to map between UTF-16 strings and earlier codepages in the same manner as Windows.

MS-ADTS
6.5 Unicode String Comparison
This section specifies how the Unicode sort methods specified in [MS-UCODEREF] are utilized to perform comparisons of Unicode strings.
https://msdn.microsoft.com/en-us/library/cc223825.aspx
6.5.1 String Comparison by Using Sort Keys
To compare strings, the implementer needs to get a "sort key" for each string (see [MSASRT]). A binary comparison of the sort keys can then be used to arrange the strings in any desired order.
This section utilizes the GetWindowsSortKey and CompareSortKeys procedures, which are specified in [MS-UCODEREF].
The flags that need to be passed to GetWindowsSortKey depend on the comparison being performed. This is specified in the following table.
. . .

Thanks,
Edgar

-----Original Message-----
From: Edgar Olougouna 
Sent: Tuesday, January 19, 2016 12:08 PM
To: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Cc: MSSolve Case Email <casemail at microsoft.com>; cifs-protocol at lists.samba.org
Subject: RE: 116011913598783 Active Directory server sort Unicode normalization

Douglas,
I'll review and follow-up.

Thanks,
Edgar

-----Original Message-----
From: Sreekanth Nadendla 
Sent: Monday, January 18, 2016 7:21 PM
To: Douglas Bagnall <douglas.bagnall at catalyst.net.nz>
Cc: MSSolve Case Email <casemail at microsoft.com>; cifs-protocol at lists.samba.org
Subject: 116011913598783 Active Directory server sort Unicode normalization

Casemail in Cc
Dochelp in Bcc

Hello Douglas,
                          Thank you for your inquiry about Active Directory Specifications. We have created incident # 116011913598783 to investigate this issue. One of the Open specifications team member will contact you shortly.
 
  
Regards,
Sreekanth Nadendla
Microsoft Windows Open specifications

-----Original Message-----
From: Douglas Bagnall [mailto:douglas.bagnall at catalyst.net.nz] 
Sent: Monday, January 18, 2016 4:05 PM
To: Interoperability Documentation Help <dochelp at microsoft.com>
Cc: cifs-protocol at lists.samba.org
Subject: Active Directory server sort Unicode normalization

hi Dochelp,

I am trying to improve server sort for Samba. Is there anywhere a machine readable version of the file "Active Directory Sort Table v02.pdf" from http://www.microsoft.com/en-us/download/details.aspx?id=1175
and if so, is there a license for its use?

I'm thinking of something akin to the text files in http://www.microsoft.com/en-us/download/details.aspx?id=10921.
Alternatively, is one of these more or less the same as the PDF (which
*looks* to be the case)? And how are these licensed?

Or, is there somewhere a description of how to derive these tables from the Unicode documents?

Another question: does Active Directory have any way of sorting characters outside the basic multilingual plane? Following RFC4518, the character "🅌" ("SQUARED SD" https://codepoints.net/U+1F14C) would be NFKC normalized and sort equivalently to "SD", but I can't see how Windows would deal with that.

In general it looks like Windows *almost* follows the RFC, but retains a bit of low precedence information in the case weight field (distinguishing e.g. the plain digit 1 from the superscript digit ¹) that would be lost to strict NFKC normalization. And it stops at 0xFFFF. Is that a fair summary?

thanks,
Douglas



More information about the cifs-protocol mailing list