Problem with connection database

Mon Aug 27 13:10:08 GMT 2001

Hi Claus,
I haven't had as much time as I would like to look at this, and I'm not as
up on the tdb stuff as I would like,
so I'm posting my initial observations back to the samba-technical list as
well, in hopes these will suggest a course of action to someone more
knowlegable - to me it appears to almost be a race condition of some sort:

All of the bad magic number reports in the log all boil down to the
following magic numbers:

magic 0x42424242
magic 0x0
magic 0x10000
magic 0x3122
magic 0x41ed
magic 0x4d8
magic 0xd9fee666

And of course 

magic 0x26011999  is reported when a free call is made, because this is the
TDB_MAGIC number, not the TDB_MAGIC_FREE number...  what does this mean?
something crapped out before changing the magic number to free_magic, but
somehow put it in the list to be freed???...

The offsets reported all boil down to:

0x0 offset=18676
0x0 offset=2
0x0 offset=24113
0x0 offset=26003
0x0 offset=26019
0x0 offset=4589
0x0 offset=620
0x0 offset=64
0x0 offset=65536
0x10000 offset=2834
0x3122 offset=0
0x41ed offset=45336
0x42424242 offset=1316
0x42424242 offset=18676
0x42424242 offset=2556
0x42424242 offset=5656
0x42424242 offset=620
0x42424242 offset=6276
0x42424242 offset=6896
0x42424242 offset=696
0x42424242 offset=7516

0x4d8 offset=25496
0xd9fee666 offset=11856
0xd9fee666 offset=15576
0xd9fee666 offset=16816
0xd9fee666 offset=18676
0xd9fee666 offset=45336
0xd9fee666 offset=696
0xd9fee666 offset=73596
0xd9fee666 offset=88740

and the offsets where it is reporting bad magic because it's NOT
TDB_FREE_MAGIC is:

0x26011999 offset=16816
0x26011999 offset=18676
0x26011999 offset=49156

Ok, we are trying to FREE the above three offsets;
the 1st to got bad magic when the were being tried to 
read/write or something as well; maybe comparing what was going on when it
was being freed as opposed to being allocated/read/wrote, and the timing of
that might give us some clues....

We also have a bunch of tdb_oob len xxxxxxx beyond eof at

at the following len/eof's:

-1627389928 65536
-2147483577 65536
-788529101 65536
1109419649 73732
1111638598 73732   >>> this is 0x42424246
1111638598 90112   in fact all of the
1111638618 24576   ones in this range look
1111638618 65536   like we just have the initialized 
1111638618 73732   memory except for last byte<<<<<<
131137560 65536    
1346978887 65536
16777247 65536
181207044 65536
181207044 73732
485512 65536
65560 65536
875573320 65536
925905992 65536
956301336 65536

With all this, it APPEARS that a link in the list is broken, so we are just
following out to 'random' offsets in the tdb file; we are prevented from
making really BAD mistakes by the fact that the TDB_MAGIC for each record is
checked, and processing stops at a bad magic number; but how the link got
corrupted to begin with is still evaiding me...

I haven't been able to pin down the initial corruptor in the logs; we may
have missed this, and are now seeing only the results of a corruption whose
log entry has already wrapped.

Hope this will suggest something to some one more knowlegable on the tdb
than myself.  I still have your logs, etc. if someone else on the list
wishes to help out.
Don 

-----Original Message-----
From: Claus Svarer [mailto:csvarer at nru.dk]
Sent: Friday, August 24, 2001 5:33
To: MCCALL,DON (HP-USA,ex1)
Subject: Problem with connection database

Hi Don

Our server have now recreated the connection.tdb error again. I have
made a gzipped tar file of all the log and lock files that was at the
server when it chrashed (attached to this mail). I don't know if you can
find out what happens, but as you know the error message is:

fog:/users/csvarer 22 : smbstatus | more
INFO: Debug class all level = 1   (pid 843 from pid 843)
tdb(/usr/local/samba/var/locks/connections.tdb): rec_read bad magic
0x42424242 at offset=5656

If I search int the log file I can find several machines that sees this
problem. At the time where we observed the problem no new machines was
authorizing against the samba server.

Best regards
Claus