SID hash

Peter Samuelson peter at cadcamlab.org
Tue Aug 8 01:23:43 GMT 2000


[Tim Cole]
> Oh.  Heh, yes.  The definition of "good hash" varies greatly
> depending on the application :P
> 
> Eh, in this case, for use in a hashtable, probably sorted linear for
> the moment.

Then I agree with Elrond -- just add 'em up and mod by something.
Complex hash functions are most useful when the data is non-random in
some way, like if it's concentrated around powers of two.  In this
case:

SIDs from multiple domains are truly random, or close to it, so any
hash function should be OK.  Ignore that case, for now.

SIDs in the same domain are identical but for RID, so consider the RID.
It is not random; I think on NT it's usually a small number greater
than 500.  On Samba it's related to the UID, which in some
organizations is very non-random; for example, I categorize my users by
multiples of 1000, so most people are between 1000N and 1000N+50 for
five or six values of N.

So for NT RIDs, just add and mod by anything you want.  If you need to
consider Samba RIDs, it's probably best to use a prime number of
buckets, like they always say about hashing anyway.

You'd want to analyze your hash chains on typical input data.  That's
the best way to determine if a hash function is good....

Peter




More information about the samba-technical mailing list