Name mangling patch

Francois Gouget fgouget at psn.net
Sun Mar 14 08:44:38 GMT 1999


	I started looking into problems I was having when doing a 'del
/s *.class'. Out of about 690 files about 190 were not correctly
deleted. So I started looking into 'smbd/mangle.c' and after a few hours
I realized it was all fixed in 2.0.3.

	Actually almost, I still had 2 files that did not delete
correctly because of clashes in the mangled name. To do the mangling
Samba uses a base 36 but there are other characters that are not mabgled
by Windows that we can use. The list I have found is the following:
"_-!@#$%" ('~' is not mangled either but I'm not sure we can use it).
Adding these means we use a base 43 which once squared translates into
the risk of clash being reduced by a factor of around 1.43. And for me
that solves it, I not longer get name clashes :-).
	This change is in 'diff1990313-mangle1.txt'.

	While I was poking in mangle.c I realized that in many cases
(every filename with a 3 character or less extension) we were mangling
the filename twice: once with the extension and once without. This
certainly does not affect performance much but still it bothered me so I
made up a patch to avoid this. 
	This change is in 'diff1990313-mangle2.txt'.


	Back to the subject of name mangling, would it be possible to
use the mangled name stack to avoid clashes ? I admit I am not sure how
much impact it would have but would the following scenario be possible ?

 * Building a mangled name
    - compute the filename hashcode (modulo MANGLE_BASE^2). Check in the
mangled stack whether there is already another file with that hash code
    - if no build the name and return it to whoever asked for it
    - if yes it means we have a clash. In that case change the hashcode
until it no longer clashes (e.g. increment or use more clever hashtable
techniques) 

 * Using the name
    - As long as you find the mangled name in the mangled stack
everything's fine. Note that in the stack you may have two long names
that would normally clash but we returned a modified name for one of
them just to avoid the clash. As long as this modified name is still in
the mangled stack we don't have any problem.
    - The angled name is not in the stack. I don't know what
currently happens in this case. I guess we can take all the files in
that directory and compute their mangled names. If we find one that
matches good. It may not be the right file but it would be no worse than
the current situation.
    - If no perfect match is found we could try to find files for which
everything but the mangling par matches, i.e. if we look for 'foo
f~sc.txt' and we find a 'foo file.txt' then that must be it. That is
unless it was a file we just deleted.


	Anyone care to comment on this ? Is there any major flaw in
this scheme ? Would anyone be interested to implement it (it may take
time before I get to it if at all) ?

--
Francois Gouget
fgouget at multimania.com

-------------- next part --------------
--- samba-2.0.3-ref/source/smbd/mangle.c	Sat Feb 27 14:09:09 1999
+++ samba-2.0.3-tst2/source/smbd/mangle.c	Tue Mar  9 16:00:38 1999
@@ -63,11 +63,13 @@
  *                  global.  There is a call to lp_magicchar() in server.c
  *                  that is used to override the initial value.
  *
- * basechars      - The set of 36 characters used for name mangling.  This
+ * MANGLE_BASE    - This is the number of characters we use for name mangling.
+ *
+ * basechars      - The set characters used for name mangling.  This
  *                  is static (scope is this file only).
  *
- * base36()       - Macro used to select a character from basechars (i.e.,
- *                  base36(n) will return the nth digit, modulo 36).
+ * mangle()       - Macro used to select a character from basechars (i.e.,
+ *                  mangle(n) will return the nth digit, modulo MANGLE_BASE).
  *
  * chartest       - array 0..255.  The index range is the set of all possible
  *                  values of a byte.  For each byte value, the content is a
@@ -110,12 +112,13 @@
 
 char magic_char = '~';
 
-static char basechars[] = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
+static char basechars[] = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_-!@#$%";
+#define MANGLE_BASE       (sizeof(basechars)/sizeof(char)-1)
 
 static unsigned char chartest[256]  = { 0 };
 static BOOL          ct_initialized = False;
 
-#define base36(V) ((char)(basechars[(V) % 36]))
+#define mangle(V) ((char)(basechars[(V) % MANGLE_BASE]))
 #define BASECHAR_MASK 0xf0
 #define ILLEGAL_MASK  0x0f
 #define isbasechar(C) ( (chartest[ ((C) & 0xff) ]) & BASECHAR_MASK )
@@ -876,7 +883,7 @@
               }
             else 
               {
-              extension[extlen++] = base36( (unsigned char)*p );
+              extension[extlen++] = mangle( (unsigned char)*p );
               }
             p += 2;
             break;
@@ -910,7 +917,7 @@
           }
         else 
           {
-          base[baselen++] = base36( (unsigned char)*p );
+          base[baselen++] = mangle( (unsigned char)*p );
           }
         p += 2;
         break;
@@ -927,10 +934,10 @@
     }
   base[baselen] = 0;
 
-  csum = csum % (36*36);
+  csum = csum % (MANGLE_BASE*MANGLE_BASE);
 
   (void)slprintf(s, 12, "%s%c%c%c",
-                 base, magic_char, base36( csum/36 ), base36( csum ) );
+                 base, magic_char, mangle( csum/MANGLE_BASE ), mangle( csum ) );
 
   if( *extension )
     {
-------------- next part --------------
--- samba-2.0.3-ref/source/smbd/mangle.c	Sat Feb 27 14:09:09 1999
+++ samba-2.0.3-tst2/source/smbd/mangle.c	Tue Mar  9 16:00:38 1999
@@ -828,7 +831,7 @@
  */
 void mangle_name_83( char *s)
   {
-  int csum = str_checksum(s);
+  int csum;
   char *p;
   char extension[4];
   char base[9];
@@ -850,7 +853,11 @@
       csum = str_checksum( s );
       *p = '.';
       }
+    else
+      csum = str_checksum(s);
     }
+  else
+    csum = str_checksum(s);
 
   strupper( s );
 


More information about the samba-technical mailing list