Doubts about Samba's unicode translation tables

Douglas Bagnall douglas.bagnall at catalyst.net.nz
Mon Apr 22 05:26:40 UTC 2024


On 19/04/24 21:04, Xavi Hernandez via samba-technical wrote:
> The first question is why Samba uses two tables while Windows only requires
> one ?
> For what purpose is the lowercase translation table in Samba used ?
> Is the Samba's case-insensitive comparison method actually equal to Windows
> ?

I don't have real answers, but I think the current mappings date back to
this 2001 commit:

https://gitlab.com/samba-team/samba/-/commit/9bcd133e9e7b0cfe974f273fb23409d660af8358

The Windows sorting weight tables change often.
On https://www.microsoft.com/en-us/download/details.aspx?id=10921 we see:

   Windows Vista Sorting Weight Table.txt
   Windows 8 and Windows Server 2012 Sorting Weight Table.txt
   Windows Server 2008 Sorting Weight Table.txt
   Windows 7 and Windows server 2008 R2 Sorting Weight Table.txt
   Windows 8 Upper Case Mapping Table.txt
   Windows NT 4.0 through Windows Server 2003 Sorting Weight Table.txt
   Windows 10 Sorting Weight Table.txt

That is not exactly the same thing as case mapping (apart perhaps from
the one called "Windows 8 Upper Case Mapping Table"). It seems likely that
a lot of the changes are for new Unicode characters beyond the 16 bit plane.

"Windows 8 Upper Case Mapping Table.txt" has at least some of the changes in 
your differences.txt.

This Gitlab thread is related:

https://gitlab.com/samba-team/samba/-/merge_requests/3258#note_1576341163

I have never got to the bottom of why we do what we do and how it differs
from Windows, but I suspect the answer is it works well enough most of
the time. That's worrying, but not enough to make it a priority.

Thanks for looking and asking.

Douglas




More information about the samba-technical mailing list