Verifying uniform distribution of random numbers

Tue Apr 9 17:19:43 EST 2002

On Tue, 9 Apr 2002 16:19, Gordon Deane wrote:
<snip>
> The uniform distribution on a finite space is exceptionally simple.  I
> doubt you need to know about hypothesis testing arbitrary distributions; if
> your bin counts end up roughly even after a few million runs or so you are
> probably as uniform as anyone would reasonably expect.  I mean, a few %
> deviation on a particular IP is unlikely to be an issue, is it?  It would
> seem to me that the libc random number generator ought to be good enough to
> meet this spec at first glance.  Of course, your mapping from this onto
> your 65024 bins needs to preserve the uniformity.
1<->1.

> That begs the question of how you seed the generator.  Does it matter
> whether a malicious observer could guess the IP you create?   If you do
> this on boot, there may not be much real new entropy in /dev/random,
> although if you count on having alread restored the saved entropy pool that
> would be OK.
It is seeded from the ethernet address (by spec). It had better not matter 
matter what the IP is, because the next stage is to ARP for that address.

> Assuming it doesn't, the system clock would be another choice.  I wonder,
> though, if I had 60 identical computers and the power failed (too long for
> a UPS to cope, but we'll assume an orderly shutdown).  When they come up,
> it is likely to be at very much the same time, perhaps even to the
> microsecond in some cases.  They can't all have the same IP.  I suppose the
> Ethernet collision avoidance ultimately only allows one at a time to make
> an ARP request.
The intent is that given a unique seed, then machines won't try the same 
pattern of IPs.

<snip>

> As Jeremy points out, you could get a uniform distribution just by counting
> from 1 to N.  The problem is that you want to avoid conflicting with
> existing devices on the network without querying first, and we assume they
> used the same algorithm.  The spec writers have figured that a random guess
> will be the quickest way of doing this, assuming you can come up with a
> different seed to the other devices.  If it fails, I gather you just
> brute-force your way until you succeed.
Yep, just try again.

<snip>
> Yes, because in crypto a number generator that can be second-guessed or
> that isn't quite uniform might seriously weaken the security.  That's
> unlikely to be an issue here.  Only worry about details where you need it. 
> Remember that the better the RNG, the slower it is likely to be, and the
> more code involved; ask yourself whether you really ought to bother.
I don't really care what the RNG is, because my unit test just hammers 
whatever implementation is used. IIRC, it is part of the rand48() family.

<snip>
> A statistician could give you a better answer, but in the absence of
> confidence levels I would have thought that "near enough was good enough",
> and I would wager that was the intent of the spec writers.
I am trying to do automated testing, so while an eyeball of a distribution 
plot would normally be my preferred option, I need something that comes up 
with a Go / NoGo answer.

> If the RNG is any good your bin counts should have a normal distribution,
> and you can calculate the variance & std. deviation.
>
> The theoretical mean is n*p = num trials / num bins
> Theoretical variance = n*p*(1-p) = trials * (1/bins) * (1-1/bins)
>                                  ~= trials * 2e-5
> Std. deviation = sqrt(Variance) = sqrt(trials) * 4e-3
> so for 10 million (1e8) trials, you would expect a mean of 1537.9 and a
> std. dev of about 4.
>
> The chances of getting bins outside +-6*std dev should be very small, but
> about 98% should be inside 2*std dev, ie, 1538+-8.  Set a confidence
> interval of +-100, say that's good enough, and anything outside that really
> would be non-uniform.
>
> In your units tests, one good result would just be to verify that each IP
> can actually occur, (no bin is empty) hence your RNG has coverage; that's
> more likely to be a problem than lack of uniformity in distribution where
> it does cover.
And a very appreciated 10c worth it was too.
So the unit tests are basically:
1. to verify no empty bins
2. to verify (say) 95% of the bins are within (say) 1538+-100.

Thanks a heap.

Brad