Doubts about Samba's unicode translation tables

Douglas Bagnall douglas.bagnall at catalyst.net.nz
Wed Apr 24 01:33:56 UTC 2024


On 23/04/24 20:39, Xavi Hernandez wrote:

>     I am curious whether "Windows 8 Upper Case Mapping Table.txt" from
> 
>      >>     On https://www.microsoft.com/en-us/download/details.aspx?id=10921
>     <https://www.microsoft.com/en-us/download/details.aspx?id=10921>
> 
>     matches the $UpCase table you find, and whether that means we just have an old
>     one from win2k days. I don't see a change in Linux's fs/ntfs/upcase.c though, so
>     I suspect not.
> 
> 
> I've done a bit more research. Actually, the kernel ntfs driver doesn't generate 
> the upcase table, it just loads it from the $UpCase file in the NTFS filesystem 
> and uses it for filename comparisons. The comparison function uses the table to 
> convert both strings to uppercase (maybe not strictly uppercase, but a canonical 
> value) and compares it. Nothing else.

Right. It turns out I was looking at the old fs/ntfs/, not the current fs/ntfs3/.

> I've looked at the code that creates NTFS filesystems (mkfs.ntfs in ntfsprogs 
> package) and I've seen that it supports 3 different upcase tables for 3 
> different Windows versions. I've extracted all 3 tables from ntfsprogs (winxp, 
> vista, win7), the table from the "Windows 8 Upper Case Mapping Table.txt" file 
> (win8), the table from Samba code (samba), and the table from a Windows 11 
> machine (win11).
> 
> What I've seen is that win7, win8 and win11 are identical, vista is different 
> from all the others, and winxp and samba are equal.
> 
> The ntfsprogs package also has code to generate a lowcase table. I generated the 
> lowcase table for winxp and compared it to the lowcase_table from Samba. They 
> are equal.
> 
> So it seems that Samba is using Windows XP tables.
> 
> Some questions:
> 
> Should we update the table to the latest Win8 ?

This is complicated, and I'm not the best person to answer.

As I understand it, you can mount an XP volume with a Windows 11 kernel (or 
current Linux), and nothing will go wrong.

But if we change this table, upgrading Samba from 4.20 to 4.21 would cause 
"ȩ.txt" to collide with "Ȩ.txt". That could be very bad for a few users, and 
somewhere between irrelevant and good for everyone else.

If we do update the table, I think it would be best to generate a upcase-table.c 
dynamically during build using either a copy of "Windows 8 Upper Case Mapping 
Table.txt" (if we can work out the license, about which we are picky), or using 
declarative ranges as ntfsprogs and the old kernel do.

> Should we support different tables and make it configurable ?
> Should we dynamically load the table from the shared filesystem itself (similar 
> to accessing an existing NTFS) ?
> 
> Should we differentiate regular case-insensitive comparison from filename 
> comparison ?

For now, I will just say they are reasonable questions.

Douglas




More information about the samba-technical mailing list