CIFS vs. NFS and other filesystems (was Client for Samba Networks)

Steven French sfrench at us.ibm.com
Tue Dec 18 10:55:29 GMT 2001


A typo on my earlier list:

There should not be an entry for Unicode under NFS v3 strengths

Also - the comment that:
"The use of UCS-2 is as much a strength as it is a weakness; Unicode does
not stop at the 16-bit boundary, and the lack of support for the
additional codepoints is of some concern to users of modern non-Western
languages."

is interesting.   Whether sending Unicode/UCS-2 on the wire is a strength
or weakness of CIFS ... since it is optional (for CIFS) and solved real
customer internationalization problems, I find it easier to view as a
strength.     I don't know what would happen with the longer UTF-16 or
UTF-32 strings (see http://www.unicode.org/unicode/reports/tr19/ ) though.
We should try it. The following quote from the Unicode FAQ
(http://www.unicode.org/unicode/faq/utf_bom.html#6) made me wonder what
would happen if one of these strings was sent by a CIFS server or client
(should not be too hard to try).
                                                                               
        "Unicode was originally designed as a pure 16-bit encoding, aimed at   
        representing all modern scripts. (Ancient scripts were to be           
        represented with private-use characters.) Over time, and especially    
        after the addition of over 14,500 composite characters for             
        compatibility with legacy sets, it became clear that 16-bits were not  
        sufficient for the user community. Out of this arose UTF-16.           
                                                                               
                                                                               
        UTF-16 allows access to 63K characters as single Unicode 16-bit units. 
        It can access an additional 1M characters by a mechanism known as      
        surrogate pairs. Two ranges of Unicode code values are reserved for    
        the high (first) and low (second) values of these pairs. Highs are     
        from 0xD800 to 0xDBFF, and lows from 0xDC00 to 0xDFFF. In Unicode 3.0, 
        there are no assigned surrogate pairs. Since the most common           
        characters have already been encoded in the first 64K values, the      
        characters requiring surrogate pairs will be relatively rare (see      
        below)."                                                               
                                                                               







Steve French
Senior Software Engineer
Linux Technology Center - IBM Austin
phone: 512-838-2294
email: sfrench at us.ibm.com





More information about the samba-technical mailing list