Problem with locking.tdb

MCCALL,DON (HP-USA,ex1) don_mccall at hp.com
Mon Aug 13 18:42:12 GMT 2001


Hi claus,
No, sorry.  what the message is telling us is that the entry for the
tdb.magic field for the record we were trying to read had a bad magic
number; this is sort of a primitive check to make sure that the record we
are reading is valid; it SHOULD contain one of 
TDB_MAGIC (0x26011999U)
OR, if the record is deleted,
TDB_DEAD_MAGIC (0xFEE1DEAD).
OR, if the record is free to be used
TDB_FREE_MAGIC
In this case, it is NEITHER - being reported as  0xd9fee666

Since we never intentionally apply this value to a tdb record in the magic
field, it probably indicates that the record passed to the rec_read()  (from
whence this message is being sent) is invalid...

I note that you saw this previously in an earlier post, and the bad magic
number (0xd9fee666)  reported was the SAME, with a different offset
(offset=54016).

I don't know the significance of this, but I have cc'ed the samba-technical
mailing list in hopes one of the real Samba heads has some ideas.  You
should probably post to that list in the future, or at least cc them, when
you write to me, as you'll have a much better chance of someone
recognizing/understanding the situation than just my poor skills.


IF YOU ARE CONSISTENTLY getting this error message whenever you run
smbstatus now (wasn't just a one time deal), then it probably means there is
a corrupt record in the connections.tdb database.
It would be useful to be able to FIND that record, and maybe determine which
connection it was for, to get more info on the conditions that might have
caused this.
I don't have much time right now, but I have been interested in
investigating how tdbtool works anyway - so 
if you could send me offlist the connections.tdb file and   a level 10 debug
of your smbstatus failing (assuming that you can reproduce this), I'll try
to take a look at it and see if I can spot anything.

Hope this helps,
Don


-----Original Message-----
From: Claus Svarer [mailto:csvarer at nru.dk]
Sent: Monday, August 13, 2001 1:16 PM
To: MCCALL,DON (HP-USA,ex1)
Subject: Re: Problem with locking.tdb


Hi Don

Thanks for all your help until now. The new version download'ed from the CVS
have now been running since Thursday morning without any problems. We have
though today got a new error with an error message like:

Samba version 2.2.1
Service      uid      gid      pid     machine
----------------------------------------------
tdb(/usr/local/samba/var/locks/connections.tdb): rec_read bad magic
0xd9fee666
at offset=24256

No locked files

so there must still be a problem with the connection database. Do you have
any
idea of what has caused that?

The Samba server does still funtion, but running the smbstatus message we
get
that error message

Best regards
Claus Svarer

"MCCALL,DON (HP-USA,ex1)" wrote:

> fixed in cvs head last night by Jerry....
> Sorry about that!
> Don
>
> -----Original Message-----
> From: Claus Svarer [mailto:csvarer at nru.dk]
> Sent: Wednesday, August 08, 2001 4:20 AM
> To: MCCALL,DON (HP-USA,ex1)
> Subject: Re: Problem with locking.tdb
>
> Hi Don
>
> You are right that after having compiled the new CVS version I get a "use
> mmap = No" if I grep the testparm output, so that should be OK. I then
tried
> to run it and it also seems to be OK. To test the locking I then tried to
> run a smbstatus. From that I get an output as:
>
> fog:/users/csvarer 26 : smbstatus | more
> INFO: Debug class all level = 3   (pid 8529 from pid 8529)
> Processing section "[homes]"
> Processing section "[netlogon]"
> Processing section "[profiles]"
> Processing section "[printers]"
> Processing section "[print$]"
> Processing section "[fax]"
> Processing section "[tmp]"
> Processing section "[programs]"
> Processing section "[prg_data]"
> Processing section "[f-prot-v5]"
> Processing section "[n_adm]"
> Processing section "[nruweb]"
> Processing section "[brain99]"
> Processing section "[all]"
> Failed to open byte range locking database
> ERROR: Failed to initialise locking database
>
> Samba version 2.2.1
> Service      uid      gid      pid     machine
> ----------------------------------------------
> ttsch        ttsch    users     8508   scl3     (130.226.104.223) Wed Aug
8
> 10:11:30 2001
> csvarer      csvarer  users     8419   csport   (130.226.104.174) Wed Aug
8
> 10:10:53 2001
> f-prot-v5    pc_adm   users     8419   csport   (130.226.104.174) Wed Aug
8
> 10:10:46 2001
> n_adm        pia      users     8500   pia      (130.226.104.212) Wed Aug
8
> 10:11:05 2001
> dorthe       dorthe   users     8512   dorthe   (130.226.104.213) Wed Aug
8
> 10:11:36 2001
> saznar       saznar   users     8413   euler    (130.226.104.172) Wed Aug
8
> 10:12:03 2001
> csvarer      csvarer  users     8419   csport   (130.226.104.174) Wed Aug
8
> 10:10:49 2001
> programs     khusted  users     8491   khusted  (130.226.104.206) Wed Aug
8
> 10:11:00 2001
> brain99      csvarer  users     8419   csport   (130.226.104.174) Wed Aug
8
> 10:10:52 2001
> f-prot-v5    saznar   users     8413   euler    (130.226.104.172) Wed Aug
8
> 10:10:40 2001
> tmp          csvarer  users     8419   csport   (130.226.104.174) Wed Aug
8
> 10:10:52 2001
> programs     csvarer  users     8419   csport   (130.226.104.174) Wed Aug
8
> 10:10:52 2001
> prg_data     csvarer  users     8419   csport   (130.226.104.174) Wed Aug
8
> 10:10:52 2001
>
> Can't initialise locking module - exiting
>
>
> I believe this meens that there still must be some problems with the
locking
> database? or do you know if it just is an result of the new mmap setup?
>
> Claus
>
>
>
>
>
> "MCCALL,DON (HP-USA,ex1)" wrote:
>
> Hi Claus,
> I believe that what Jeremy did was to add a smb.conf parameter use mmap,
> which for HP should default to no.  So configure still finds that mmap is
> existing, and compiles Samba so that it CAN use mmap, but control of
whether
>
> it does or not is by the smb.conf parameter.  That way, when we DO get a
> unified cache on HPUX, we won't have to change any code, but just turn on
> the smb.conf parameter to take advantage of mmap.
> You can verify this by running testparm after you have built the product
> and grep for mmap....
> I haven't gotten the latest cvs installed myself, so let me know...
> Don
> -----Original Message-----
> From: Claus Svarer [ mailto:csvarer at nru.dk <mailto:csvarer at nru.dk> ]
> Sent: Tuesday, August 07, 2001 11:07 AM
> To: MCCALL,DON (HP-USA,ex1)
> Subject: Re: Problem with locking.tdb
>
> Hi Don
>
> I have now download'e this CVS branch and installed it, called the
configure
>
> script and have now looked into include/config.h. It seems as it still
> belives
> it should be using mmap while the two lines:
>
> #define HAVE_MMAP 1
>
> is still include. Is that correct or should I uncomment them manually?
>
> BR
> Claus
>
> "MCCALL,DON (HP-USA,ex1)" wrote:
>
> > Hi Claus,
> > Jeremy and I talked about this, and bottom line, because HP-UX doesn't
> have
> > a unified cache between mmapped files and regular files, there is
> currently
> > no way to ensure that eventurally a tdb file is not going to be
'corrupt'.
>
> > So the only way to ensure that this doesn't happen is to turn off mmap
> > useage
> > for samba on HP-UX till we get a unified cache sometime in the future.
> > Please pull down the latest CVS version of SAMBA_2_2 and build that -
> Jeremy
> > has checked in fixes that will disable mmap useage on HP-UX, as well as
a
> > few
> > corner cases where .tdb files got messed up for other reasons.  See if
you
>
> > still
> > have the same issue after that.
> > Thanks,
> > Don
> > -----Original Message-----
> > From: Claus Svarer [ mailto:csvarer at nru.dk <mailto:csvarer at nru.dk> ]
> > Sent: Thursday, August 02, 2001 7:29 AM
> > To: MCCALL,DON (HP-USA,ex1)
> > Subject: Re: Problem with locking.tdb
> >
> > Hi Don
> >
> > Thank you for your help. I believe I should report about our
> experienceusing
> > the
> > new compiled Samba server (with MAP_FIXED) for a week. It has been
running
>
> > without any problems until this morning, but then we got a (from a
> > smbstatus):
> >
> > Samba version 2.2.1a
> > Service      uid      gid      pid     machine
> > ----------------------------------------------
> > n_adm        pia      users    10742   pia      (130.226.104.212) Thu
Aug
> 2
> > 12:
> >
> > 27:15 2001
> > tdb(/usr/local/samba/var/locks/connections.tdb): rec_read bad magic
> > 0xd9fee666 a
> >
> > t offset=54016
> >
> > and we have some strange problems connecting to the Samba server (not
> > surprisingly with an error in the connection database). Restarting Samba
> > solves
> > the problem.
> >
> > Do you have any idea of what the reason to this problem can be. Do you
> know
> > if
> > the connection database is created in the same way, although mmap is not
> > used?
> >
> > Once more thanks for helping solving the first problem. Samba runs very
> > nicely
> > as a Windows2000 domain logon server.
> >
> > Best regards
> > Claus Svarer
> >
> > "MCCALL,DON (HP-USA,ex1)" wrote:
> >
> > > Hi Claus,
> > > In fact, I added a number of TDB_LOG statements to tdb.c, where
tdb_mmap
>
> > > (which calls mmap) is located, to try to track this down, and in fact
I
> > >  DO see the issue with mmap periodically returning a pa of 0.  This is
> > > NOT (according to the man page) supposed to be possible, in the way
that
>
> > > samba is calling mmap, and in fact samba in several places depends on
> this
> > > not happening - mmap is supposed to return MAP_FAILED in case of a
> > problem;
> > > it should never return Null  as a pa, if the addr that you pass to it
is
>
> > > 0.
> > > This is why, in my first message (answering your second message - I
must
>
> > > live backwards in time, like Merlin ;-)  just wish I was as smart!) I
> ask
> > > you to try adding the MAP_FIXED flag to the call to mmap() in tdb.c;
> > > If you don't specify MAP_FIXED, then the pa that is returned is left
up
> > > to the mmap code, and it has more leeway as to where to put the chunk
of
>
> > > memory you are asking for.  Doing this allowed me to consistently (so
> far)
> > > pass the smbtorture LOCK1-4 tests where I was having failures (LOCK1)
> > > before; but this is not involving the connections database, but the
> > several
> > > locking databases instead.
> > >
> > > I don't remember, which version of Samba are you running? 2.2.1a?
> > >
> > > So let me know,
> > > Thanks
> > > Don
> > >
> > > -----Original Message-----
> > > From: Claus Svarer [ mailto:csvarer at nru.dk <mailto:csvarer at nru.dk> ]
> > > Sent: Wednesday, July 25, 2001 6:46 AM
> > > To: MCCALL,DON (HP-USA,ex1)
> > > Subject: Re: Problem with locking.tdb
> > >
> > > Hello Don,
> > >
> > > Thanks for your advice, it seems to have solved the problem. At least
it
>
> > > runs for longer periods now without having any. It has now been
running
> > for
> > > 3 hours witout any problems.
> > >
> > > I have once before had two other problems with the HP mmap feature, I
> > don't
> > > know if that could be related to this problem.
> > >
> > > Problem 1:
> > > The first problem I solved by changing the mmap call from MAP_SHARED
to
> > > MAP_PRIVATE. I know that it also change the behaviour of mmap but that
> did
> > > not matter in that program, because in the program the mmap feature
was
> > not
> > > used for changing the contents in the files only for reading it. That
> > > program simply didn't work with the call of mmap with the MAP_SHARED
> > > feature, I was not able to track down why.
> > >
> > > Problem 2:
> > > In the same program mmap was used for mapping several different memory
> > > segments within the same file. If more than about 34 memory regions
was
> > > allocated within the same file, a bit dependent of the size of the
file
> > and
> > > the memory chunks (image file with a size of approx. 48 Mbyte with 1.1
> > Mbyte
> > > memory chunks), it suddently began to return zero (as I remember it,
it
> is
> > > several years ago and with HP-UX 10.20)
> > >
> > > Do you have any idea about if any of these problems is related to what
> we
> > > see now for mmap and the tdb feature?
> > >
> > > Best regards
> > > Claus Svarer
> > >
> > > PS There is one thing that I wonder a bit about. I asked about a
similar
>
> > > problem with vers 2.2.0 two months ago, and there I got several emails
> > back
> > > that told that they have no problem with the Samba version and HP-UX.
Do
>
> > you
> > > have any idea if any of the kernel parameters could influence the
> > behaviour
> > > of the mmap function (also related to the second problem I am
> describing)?
> > >
> > > "MCCALL,DON (HP-USA,ex1)" wrote:
> > >
> > > > Hello Claus,
> > > > We are seeing this as well.  running the smbtorture test LOCK1
> > > > will show this (and other issues as well).  I'm not done with
> > > > my investigation, but at the moment, it APPEARS to be an issue
> > > > with the mmap implementation on HP-UX, and the fact that HP-UX
> > > > uses different caches for memory mapped file access as opposed
> > > > to filesystem (read/write) access to files.  Samba 'fails thru'
> > > > to read/write when it detects that the tdb->map_ptr value is
> > > > zero for tdb_write and tdb_read, but some other tdb calls (I
> > > > THINK) are still accessing via the mmapped address.  According to
> > > > the man page for mmap, mmap should never map to 0, but additional
> > > > debug statements I have added to the code indicate that in some
> > > > circumstances, IT IS doing this.  Until we
> > > > get this resolved, you can probably workaround this problem
> > > > by changing the lines (there are 2 of them):
> > > >
> > > > #define HAVE_MMAP 1
> > > >
> > > > in the include/config.h file
> > > >
> > > > and then doing a "make clean",
> > > > and then a "make"
> > > > to rebuild samba WITHOUT mmap support.
> > > >
> > > > Let me know if this doesn't do it for you.
> > > > Don
> > > >
> > > > -----Original Message-----
> > > > From: Claus Svarer [ mailto:csvarer at nru.dk <mailto:csvarer at nru.dk> ]
> > > > Sent: Tuesday, July 24, 2001 7:22 AM
> > > > To: samba-technical at lists.samba.org
> > > > Subject: Problem with locking.tdb
> > > >
> > > > Hi
> > > >
> > > > I have just download'ed and compiled the new Samba release 2.2.1a at
a
>
> > > > HP9000 system running HP-UX 11.00. The smb and nmb starts without
any
> > > > problems but after running for some time we get error messages like,
> > > > running the smbstatus command:
> > > >
> > > > tdb(/usr/local/samba/var/locks/locking.tdb): rec_read bad magic
> > > > 0xd9fee666 at offset=5596
> > > > locked file list truncated
> > > >
> > > > Restarting Samba solves the problem but it consistently returns
again
> > > > when smbd has be running for some time. (Some time it is in other of
> the
> > > > tdb files that is reported as problematic e.g. connections.tdb)
> > > >
> > > > Does anyone know about a solution for this problem, while I can see
> that
> > > > other people reports other problem with the new locking method using
> tdb
> > > > files (we previously used Samba 2.0.7 which doesn't cause this kind
of
>
> > > > problems)?
> > > >
> > > > Best regards
> > > > Claus Svarer




More information about the samba-technical mailing list