Name mangling patch
Francois Gouget
fgouget at psn.net
Thu Mar 18 06:59:16 GMT 1999
Maybe we could use a third digit as a way to avoid clashes,
use two if everything is well and three if we detect a clash.
But let me restart from the beginning.
On Thu, 18 Mar 1999, Christopher R. Hertel wrote:
> Luke,
>
> Not really. His first patch fixes (I haven't checked it, but it's
> supposed to fix) that annoying problem I noticed but never tracked down,
> in which we were calling the mangling routines once too often.
I'm not sure it's related. One part was about avoiding to call
str_checksum twice *from* the name mangling. So unfortunately it's
probably unrelated.
> So, the clash-avoidance wouldn't slow us down any more than it does now.
> Remember, too, that we're in a splay tree so speed shouldn't be a problem.
>
> Thing is, and I have to check this, that the cache is there to allow
> bi-directional mangling, among other things. If we do clash avoidance,
> then the algorithm *must* be bi-directional and consistant.
>
> What if we have two names which mangle to the same thing (this is a very
> unreal example):
>
> Unix Name DOS (mangled) name
> name one.bugs name~.bxx
> name two.bugs name~.bxx
>
> Conflict. So, we detect the conflict and remangle the second name.
> Cool, no conflict. Except...
>
> Someone deletes the first file (either on the DOS or Unix side, doesn't
> matter). When we next visit the directory (i.e., no cache in use), the
> other file's mangled name will have changed! This is *no good*
> (particularly because the user will say "Oy! I thought I'd deleted that
> file! I'll just delete it again! Oy! Where's the other file?".
But is the current algorithm better ?
Let's say we have (to stick a little bit more to reality):
Unix Name DOS (mangled) name
name one.bugs name~aa.bug
name two.bugs name~aa.bug
I think that one of the problems that I'm having with 'del /s'
is the following:
1. del enumerates a directory contents and somehow builds a list. It
gets two entries called 'name~aa.bug'.
2. it deletes the first one, so far so good.
3. it deletes the second one but this fails because 'name~aa.bug'
still maps to 'name one.bugs' in the cache and this file has been
deleted.
4. del stopped at point 3. but we directory is not empty so we would
not be able to delete it, potentially creating more problems.
It seems a bit odd to me that del builds a list of the files to
delete before doing so. Or there is something in Samba that makes it
look like it.
With the scheme I propose it should work like this:
1. del enumerates the directory. We return 'name~aa.bug' for 'name
one.bugs'.
2. we realize there is a clash for 'name two.bugs' and thus we return
'name~bb.bug'. Only there should we incur a performance penalty. Both
are
put in the mangled stack.
3. del deletes 'name~aa.bug', it's in the stack, no problem
4. del deletes 'name~bb.bug', it's in the stack, no problem
Now what happens if you try to map the mangled names back to the
Unix file when they are no longer in the mangled stack (assuming we did
not delete them):
1. open 'name~aa.bug', we have two files, which one should we open ?
We have no idea but this is nothing new. Take the first one or flip a
coin.
2. open 'name~bb.bug'. None of the Unix files mangle to this name. But
we found two files that mangle to a file of the form 'name~XX.bug'.
Maybe these are false matches. In this case either we refuse the partial
match, then mangled names resulting from cache avoidance can only be
used while in the mangle stack, or we accept the match which can lead to
the wrong file being open which is not worse than the current situation
except if the file that this referred to was deleted in the meantime.
Note that there's other solutions: increasing the base and the
number of digits. For the base we may gain one more value, '~'. It may
be nice especially since the user can choose what he wants to use for
this and he's likely to clash with one of our base characters. now if
clashes are fine it's good.
Maybe we could add one more digit: try 3 digits instead of two.
I understand that three digits means you only have 4 characters left for
the beginning of the name so maybe we could use the third digit only to
avoid clashes. This should avoid the last problem exposed above.
--
Francois Gouget
fgouget at multimania.com
More information about the samba-technical
mailing list