Doubts about Samba's unicode translation tables
Douglas Bagnall
douglas.bagnall at catalyst.net.nz
Mon Apr 22 05:26:40 UTC 2024
On 19/04/24 21:04, Xavi Hernandez via samba-technical wrote:
> The first question is why Samba uses two tables while Windows only requires
> one ?
> For what purpose is the lowercase translation table in Samba used ?
> Is the Samba's case-insensitive comparison method actually equal to Windows
> ?
I don't have real answers, but I think the current mappings date back to
this 2001 commit:
https://gitlab.com/samba-team/samba/-/commit/9bcd133e9e7b0cfe974f273fb23409d660af8358
The Windows sorting weight tables change often.
On https://www.microsoft.com/en-us/download/details.aspx?id=10921 we see:
Windows Vista Sorting Weight Table.txt
Windows 8 and Windows Server 2012 Sorting Weight Table.txt
Windows Server 2008 Sorting Weight Table.txt
Windows 7 and Windows server 2008 R2 Sorting Weight Table.txt
Windows 8 Upper Case Mapping Table.txt
Windows NT 4.0 through Windows Server 2003 Sorting Weight Table.txt
Windows 10 Sorting Weight Table.txt
That is not exactly the same thing as case mapping (apart perhaps from
the one called "Windows 8 Upper Case Mapping Table"). It seems likely that
a lot of the changes are for new Unicode characters beyond the 16 bit plane.
"Windows 8 Upper Case Mapping Table.txt" has at least some of the changes in
your differences.txt.
This Gitlab thread is related:
https://gitlab.com/samba-team/samba/-/merge_requests/3258#note_1576341163
I have never got to the bottom of why we do what we do and how it differs
from Windows, but I suspect the answer is it works well enough most of
the time. That's worrying, but not enough to make it a priority.
Thanks for looking and asking.
Douglas
More information about the samba-technical
mailing list