Case Sensitivity vs. Case Preservation

MCCALL,DON (HP-USA,ex1) don_mccall at hp.com
Wed Mar 14 16:53:09 GMT 2001


Hi John,
I recently wrote a little internal article in HP to try to graphically show
the difference between case sensitivity and case preservation (Unix is case
sensitive, NT only 'preserves case').  Perhaps this will help you understand
it better, and configure your Samba server to do what you want.  It was
originally a ms word document format, so you may have to fiddle around with
the window size to make it really readable...
Any comments or corrections WELCOME!!!
Hope this helps,
Don McCall



Title: Being Sensitive to Case
Author: Don McCall, Senior Support Engineer, Hewlett-Packard
The differences between PC and Unix filesystems are legion: how big they can
be, what attributes they support, what 
characters are legal for file and directory names, etc.  Samba does a truly
heroic job of handling these differences.  One 
way it does that is by recognizing and handling the difference between an
operating system that is case sensitive 
(Unix), and one that simply preserves case (Microsoft Windows9x, WinNT, and
Windows2000).  In this article I'd like 
to examine this difference.

Let us take for our case (no pun intended) study four filenames:
FILEONE.TXT
filetwo.txt
Filethree.txt
FiLeFoUr.TxT

The first file (FILEONE.TXT) is all uppercase.
The second file (filetwo.txt) is all lowercase.
The third file is a special 'case' (again forgive the pun) whose first
letter ONLY is uppercase.
The fourth file (FiLeFoUr.TxT) is a truly 'mixed' case, representative of
the myriad permutations a file name could 
take on.
In Unix, if we used the touch command to create these four filenames, we
would get exactly what we requested, i.e., a 
set of files in the directory whose names exactly matched the case we typed
in:

? rw-r-r--   1 root       sys              0 Dec 19 15:17 FILEONE.TXT
? rw-r-r--   1 root       sys              0 Dec 19 15:18 FiLeFoUr.TxT
? rw-r-r--   1 root       sys              0 Dec 19 15:18 Filethree.txt
? rw-r-r--   1 root       sys              0 Dec 19 15:18 filetwo.txt

Note that the Unix command 'll' doesn't try to pretty this up - it shows it
just as it sees it.  I personally approve of 
commands that actually do what you tell them to, without embellishment.  But
that's me.
Now let's move on to another operating system, Windows NT.  For this
discussion, we will be using Windows NT 4.0 
Workstation, with a NTFS filesystem.  Let us again create our four example
files, this time using the Windows 
Explorer graphical user interface (GUI), and the pulldown menu new/text
document.
We type in (over the obtuse and chatty default name New Text document.txt
that Windows NT Explorer favors for text 
documents) "FILEONE.TXT"  and hit return.  Immediately we run into a snag -
the file we just created appears as 
"Fileone.txt", NOT "FILEONE.TXT".  This is Windows NT Explorer interface
being 'nice' to you, assuming that 
NOONE would actually want to look at a filename all in uppercase.  I mean,
how pedestrian can one get?  Be reassured 
however; if you go into the 'command prompt' window, and actually do a 'dir'
in that directory, you will see that the 
file DOES actually show up as all uppercase - FILEONE.TXT. 
 
Hmmm - this command prompt window appears to be useful. Let's remember it
for our next examples, shall we?
Again in the Windows NT Explorer GUI, we create our next example file,
typing in "filetwo.txt"  all lowercase and hit 
return.  Ah, that's much better. Explorer actually shows us the filename
just as we typed it, "filetwo.txt".  And the 'dir' 
command in the command prompt window agrees.  We now begin to feel a bit
more comfortable. (Fools that we are...)
Lets move on to our third example file, "Filethree.txt" . Ok, I admit it - I
have thrown in a curve here.  this is not a 
standard 8.3 filename - (8.3 being a curiosity from the days when Microsoft
operating systems did not have a concept 
of 'long file names', and all files were required to be no more than 8
characters long, plus an optional .xxx, where xxx 
represented some file 'type' meaningful to specific applications).  This
filename contains NINE letters before the "." 
 
Let's see how Explorer handles this:
We type in (very carefully) "Filethree.txt".  Wonderful! Explorer shows us
just what we would expect, "Filethree.txt".  
Does the command prompt 'dir' command agree?  Indeed it does.  We appear to
be getting the hang of this case thing.
[apology to all you MS wizards out there who know that I have glossed over
the fact that Windows NT has ALSO 
created a special '8.3' name to go along with this filename - that's a topic
for another day...]
Now our final example file - a ReAlLy mixed up case file.  We again use
Windows NT 
Explorer, and again, both the Explorer and the command prompt representation
of the file
is just as we would expect, "FiLeFoUr.TxT

So, all in all, not a bad track record; except for our little 'helper'
Explorer choosing to display filenames all in 
"UPPER" case as the more ascetically pleasing "Upper" case, we get what we
ask for.

No, not so fast - we forgot that Windows treats files that conform to the
"8.3" specification differently than it does to 
'long' filenames.  Let's see what happens if we create an all UPPER case
filename that does NOT conform to the "8.3" 
spec.  We create (using explorer) a file named "FILEFIFTY.TXT" (note the 9
characters before the ".") and lo and 
behold, Explorer and the command prompt dir command are in complete and
hearty agreement - the file is represented 
just as we asked, "FILEFIFTY.TXT".  I guess if you're long enough, you don't
HAVE to be ascetically pleasing...
Ok, I've laid all this groundwork not to make fun (well, not entirely) of
Windows NT's case handling abilities, but to 
illustrate a POINT.  Windows NT 'preserves' case.  IT is NOT 'case
sensitive'.  Internally, it matters not one whit to 
Windows NT whether you named the file FILE.txt, File.txt, file.txt, or
FiLe.TxT.  They're all the same name to 
Windows NT. 
 
Don't believe me?  Try it - we have a directory with the file FILEFIFTY.TXT
right here.
We try to create (using explorer) a file named FileFifty.txt - BUZZZZZ,
wrong answer.  Explorer responds that "A file 
with the name you specified already exists. Specify a different filename".
More proof?  Ok, we like the 'command prompt' window - it has that nice,
non-GUI, command line feel that we Unix 
heads are so comfortable with.  If we use the 'dir' command to list a
specific file, for instance:

dir FILEFIFTY.TXT

you will get the listing for FILEFIFTY.TXT
but if you type in

dir FileFifty.txt
or

dir FILEFifty.TXT
or
 
dir filefifty.txt
or  ...

Well, you can work out all the permutations yourself - guess I should have
stuck to a shorter filename. Bottom line is 
that ALL of these permutations will return the entry for the file
FILEFIFTY.TXT.
Getting the picture?  As Sean O'Connor said in "Highlander": There can BE
only ONE!
Unlike Unix, where you can have in the same directory the files named:
fileone.txt
Fileone.txt
FILEONE.txt
FiLeOnE.TxT
all existing at once, in Windows this is impossibility. That's the
difference between a filesystem that PRESERVES 
case, and one that is actually CASE SENSITIVE.
And that brings me to what I really want to talk about, which is how
CIFS/9000 Server (Samba, to the rest of the 
world) deals with this.

As you would expect from an application that was grown to bridge the gap
between the Unix and Windows worlds, it is 
very flexible.  Good news, bad news - with flexibility comes responsibility,
and sometimes not a little confusion.  In the 
interest of keeping this article short enough so that someone may actually
READ it, lets restrict our conversation to 
case preservation/sensitivity, and leave out the 'mangled names'
permutations.  There are four configuration options 
that Samba provides to allow one to define its behavior when dealing with
matters of 'case':
preserve case = (yes/no)
short preserve case = (yes/no)
default case = (upper/lower)
case sensitive = (yes/no)

The first three options define, in essence, how a filename will be written
to the Unix filesystem underneath Samba.  
These options loosely correspond to how Samba will PRESERVE case.  "preserve
case" and "short preserve case" both 
do the same thing; the first in the case of NON 8.3 filenames, and the
second specifically for filenames conforming to 
the older 8.3 DOS filenaming conventions.

If these options are set to "yes" (the default), then a file will be saved
with the case as it is presented by the client.  That is, if you create a
file on a Samba share from Windows NT explorer with the name "FiLeNaMe.TxT",
a Unix 'll' of the 
file will show that its name is indeed "FiLeNaMe.TxT".

The "default case" option defines how a filename will be saved if either of
the 'preserve case' options are set to "no".  
If "default case = lower" (which is the default) then the effect is the same
as if you had "preserve case = yes".   
HOWEVER, if "default case = upper" then when "preserve case = no" a file
will always be saved using all UPPER 
case letters, regardless of how the client 'presents' it.  That is, the file
we create in explorer named "FiLeNaMe.TxT", 
will actually be saved as "FILENAME.TXT" when we look at it with the Unix
'll' command.
Whew!  That's a lot of words to explain something this simple.  Lets go back
to our four 'example' files, and look at 
what actually happens.

Let's take the Samba defaults first:
preserve case = yes
short preserve case = yes
default case = lower

Using Windows NT 4.0 Workstation Explorer interface, lets create our four
files on a Samba share.
In Explorer we type in the four file names:
FILEONE.TXT
Filetwo.txt
filethree.txt
FiLeFoUr.TxT

A Unix 'll' command will show us:
? rwxr-r--   1 ddmc       users            0 Dec 19 16:40 FILEONE.TXT
? rwxr-r--   1 ddmc       users            0 Dec 19 16:41 FiLeFoUr.TxT
? rwxr-r--   1 ddmc       users            0 Dec 19 16:40 Filetwo.txt
? rwxr-r--   1 ddmc       users            0 Dec 19 16:41 filethree.txt

Lovely! Just what we asked for.
Now, being the good little researchers we are we change ONE thing at a time,
and observe the results.
preserve case = no  <change>
short preserve case = yes
default case = lower

We remove and recreate the four files in the same manner as above, and our
trust Unix 'll' command shows us:
? rwxr-r--   1 ddmc       users            0 Dec 19 16:45 filefour.txt
? rwxr-r--   1 ddmc       users            0 Dec 19 16:45 fileone.txt
? rwxr-r--   1 ddmc       users            0 Dec 19 16:45 filethree.txt
? rwxr-r--   1 ddmc       users            0 Dec 19 16:45 filetwo.txt

Ooooh - this doesn't look good!  I can understand 'filethree.txt' being
converted to lower case; after all it is not an 8.3 
filename, so it should fall under the auspices of the 'preserve case = no'
option.  But what about the other three? They 
are all good little 8.3 filenames.  Why was case NOT preserved?  Apparently,
'short preserve case' is dependent on 
'preserve case'.  That is to say, in order for 'short preserve case = yes'
to work, 'preserve case = yes' must be set.


Ok, lets move on to the case where we specify BOTH preserve case = yes and
short preserve case = no.  default case = 
lower (by default).
Aha! This is more like it.
In explorer, when we create (with the default name) New Text Document(2).txt

ll shows us
? rwxr-r--   1 ddmc       users            0 Dec 19 17:10 New Text Document
(2).txt

When we create FileWW.txt in explorer
IT becomes fileww.txt, as we would expect; since this file conforms to 8.3,
the 'short preserve case' option is used, and 
the 'default case' of lower is used to create the filename on the Unix
system.
Another test; preserve case = yes
short preserve case = no
default case = upper

Again, success - 
Creating "HeresALongFileName.TxT" in explorer, yields
"HeresALongFileName.TxT" in our Unix listing.
Creating "ShrtFiLe.TxT" in explorer, yields "SHRTFILE.TXT" in our Unix
listing.
This is good - we told Samba to preserve case for non 8.3 filenames, and it
did.  We told samba to use default case = 
upper for 8.3 filenames, and sure enough, it converted our mixed case
"ShrtFiLe.TxT" to all uppercase. 
 
Are you getting happy yet?  I sure am.  This goes a long way towards
explaining how Samba 'preserves' case or not, 
depending on some pretty flexible configuration options.

But how is this going to affect our clients, when they start LOOKING for
files?  Well, remember that all Windows 
OS'es (WinNT, Win98, etc) PRESERVE case but are not case SENSITIVE.  One
result of this is a somewhat lassiz-
faire attitude in applications and the os itself in trying to FIND a file of
a specific name.  A program could, for instance 
CREATE a file named FileName.TXT, and then when it next opened the file,
could refer to it as filename.txt, and 
expect to find it.

This presents certain problems when you are a TRUE casesensitive operating
system like Unix.  FileName.TXT and 
filename.txt could BOTH be present in the same directory; which one does the
client really want?  
The answer is found in the 'case sensitive' configuration parameter.  The
other parameters defined how we SAVE 
filenames when we create files through Samba.  This parameter determines the
rules we follow when we try to 
RESOLVE a filename given us by a client, and it the crux of whether that
client gets WHAT IT EXPECTS or not.
By default, 'case sensitive' = no.  This means that no matter HOW the
windows os or application passes the filename to 
us, it will match the first filename that we stumble across that matches the
requested filename REGARDLESS of case.
For instance, if you have five files in your samba directory:

Aa.txt  contains <Aa>
aA.txt  contains <aA>
AA.txt  contains <AA>
aa.txt  contains <aa>
AA.TXT  contains <AA.TXT>

When you look at the directory via Windows Explorer, all five files will
appear, with the unfortunate confusion that 
there will be TWO entries for Aa.txt ( remember our friendly windows
explorer is going to 'translate' an all 
UPPERCASE 8.3 name as if only the first character in the name were
capitalized).
Unfortunately, with 'case sensitive' = no, you have no way to tell Samba
WHICH file you want to open;  doesn't 
matter which file you click on, samba is going to match it to the first file
that has a caseless match for the characters 
you provide it.

In our example above, for instance,  we will ALWAYS get the text "AA.TXT"
when we double click on ANY of the 
above filenames in the Windows Explorer window.  This is PROBABLY NOT the
behavior you would desire.
Fortunately, changing 'case sensitive' to yes will actually do what you
expect;  Windows 'preserves' case, so when it 
sends the smb request to 'open' the file, it will send the name with the
correct case; more correct in fact than the 
Windows Explorer presents to YOU.  That is, the file AA.TXT will be
requested by the name AA.TXT, not what you 
see in the Explorer window, Aa.txt.  

I trust you can see the potential for confusion and error here, but let me
beat the dead horse a little (don't report me to 
the ASPC - it's just an expression, ok?).  With case sensitive = yes, expect
to run into the following issues:
1.	Sloppy programming: If your program CREATES a file named
Initialize.INI, it had better always try to open it as 
Initialize.INI. Not initialize.ini. Not INITIALIZE.INI.  You get the drift.
2.	Dos and WFW clients will probably NOT be able to run programs using
files created by later clients (Win9x, NT) 
with the same program.  Let me illustrate by example.

Say we have an application that has a version that runs on WFW, DOS, WIN98
and WinNT.  The WIN98 and WinNT 
versions create and use a file named Startup.dat.  The WFW and DOS use the
same file, but of course they must refer to 
it as STARTUP.DAT (Dos and WfW under stand 8.3 uppercase filenames only).
If you install this program on your various PC's, and decide that you want
them to ALL use the same initialization file 
(Startup.dat), you might have the installation program locate that file on a
Samba share accessible to all clients.  Won't 
work - if the file is named Startup.dat, the WfW and Dos clients won't see
it.  if the file is named STARTUP.DAT, the 
Win98/NT clients won't see it.

So what's the practical application of this lengthy diatribe?  A simple rule
of thumb emerges:  If the files on your share 
are going to be mainly created and accessed by Windows clients, leave the
defaults alone.  If you HAVE to have 
multiple filenames in the same directory that differ only by case, change
the 'case sensitive' option to yes.  And  
recognize that this may cause some inexplicable behavior on the part of
Windows client applications accessing that 
directory.


-----Original Message-----
From: JON GERDES [mailto:GERDESJ at gkn-whl.co.uk]
Sent: Wednesday, March 14, 2001 10:59 AM
To: drew.buckman at mrmfulfillment.com
Cc: samba at us5.samba.org
Subject: Re: File name problem with Win2K


Win2K doesn't differentiate files/directories by case so you are bound to
get problems doing this.  The "fix" would have to come from MS - and that's
a little unlikely.

Try creating directories with the same name and different case on your W2K's
hard disc and see how far you get.  Samba has a whole swathe of parameters
to try and get Windows/DOS to see a consistent view of a Samba share/mount.

Cheers
Jon Gerdes

>>> Drew Buckman <drew.buckman at mrmfulfillment.com> 03/14/01 03:30pm >>>
Here is the problem. I make a directory "xyz" and put file "a" in it, 
then make a directory "XYZ" and put file "b" in it. When viewing the 
samba share with Win2K I see both directories "xyz" and "XYZ", but the 
only file that both directories have in them is file "a".  If you use 
NT4 you will see a directory "xyz" and a directory "Xyz", but you still 
have the problem

It looks like Win2k (and Win NT4) is confused about the case of the 
directories. Is there any fix for this. Other then kicking the 2K box.


-- 
To unsubscribe from this list go to the following URL and read the
instructions:  http://lists.samba.org/mailman/listinfo/samba


-- 
To unsubscribe from this list go to the following URL and read the
instructions:  http://lists.samba.org/mailman/listinfo/samba




More information about the samba mailing list