Problem with locking.tdb

MCCALL,DON (HP-USA,ex1) don_mccall at hp.com
Tue Aug 14 14:46:36 GMT 2001


Hi Claus,
If you are using log.%m (so you get a separate log for each machine) then I
would want the log for the samba server (log.<sambaservername>), as the
smbstatus command is going to be attaching using the servername.
Thanks,
Don


-----Original Message-----
From: Claus Svarer [mailto:csvarer at nru.dk]
Sent: Tuesday, August 14, 2001 4:58 AM
To: MCCALL,DON (HP-USA,ex1)
Subject: Re: Problem with locking.tdb


Hi Don,

I will post to the samba technical in the future. I will try to see if I
again
get the tdb error message for connections.tdb. I restarted samba yesterday
and
don't get it now. I was though yesterday getting the error message
consistently
(the only solution was to restart the daemon). The error log file you are
interested in receiving at level 10, which should that be, because I don't
know
wihich connection/client causes the error. Should it just be for the SMB
"root"
daemon or should it be for all the clients?

Best regards
Claus



"MCCALL,DON (HP-USA,ex1)" wrote:

> Hi claus,
> No, sorry.  what the message is telling us is that the entry for the
> tdb.magic field for the record we were trying to read had a bad magic
> number; this is sort of a primitive check to make sure that the record we
> are reading is valid; it SHOULD contain one of
> TDB_MAGIC (0x26011999U)
> OR, if the record is deleted,
> TDB_DEAD_MAGIC (0xFEE1DEAD).
> OR, if the record is free to be used
> TDB_FREE_MAGIC
> In this case, it is NEITHER - being reported as  0xd9fee666
>
> Since we never intentionally apply this value to a tdb record in the magic
> field, it probably indicates that the record passed to the rec_read()
(from
> whence this message is being sent) is invalid...
>
> I note that you saw this previously in an earlier post, and the bad magic
> number (0xd9fee666)  reported was the SAME, with a different offset
> (offset=54016).
>
> I don't know the significance of this, but I have cc'ed the
samba-technical
> mailing list in hopes one of the real Samba heads has some ideas.  You
> should probably post to that list in the future, or at least cc them, when
> you write to me, as you'll have a much better chance of someone
> recognizing/understanding the situation than just my poor skills.
>
> IF YOU ARE CONSISTENTLY getting this error message whenever you run
> smbstatus now (wasn't just a one time deal), then it probably means there
is
> a corrupt record in the connections.tdb database.
> It would be useful to be able to FIND that record, and maybe determine
which
> connection it was for, to get more info on the conditions that might have
> caused this.
> I don't have much time right now, but I have been interested in
> investigating how tdbtool works anyway - so
> if you could send me offlist the connections.tdb file and   a level 10
debug
> of your smbstatus failing (assuming that you can reproduce this), I'll try
> to take a look at it and see if I can spot anything.
>
> Hope this helps,
> Don
>
> -----Original Message-----
> From: Claus Svarer [mailto:csvarer at nru.dk]
> Sent: Monday, August 13, 2001 1:16 PM
> To: MCCALL,DON (HP-USA,ex1)
> Subject: Re: Problem with locking.tdb
>
> Hi Don
>
> Thanks for all your help until now. The new version download'ed from the
CVS
> have now been running since Thursday morning without any problems. We have
> though today got a new error with an error message like:
>
> Samba version 2.2.1
> Service      uid      gid      pid     machine
> ----------------------------------------------
> tdb(/usr/local/samba/var/locks/connections.tdb): rec_read bad magic
> 0xd9fee666
> at offset=24256
>
> No locked files
>
> so there must still be a problem with the connection database. Do you have
> any
> idea of what has caused that?
>
> The Samba server does still funtion, but running the smbstatus message we
> get
> that error message
>
> Best regards
> Claus Svarer
>
> "MCCALL,DON (HP-USA,ex1)" wrote:
>
> > fixed in cvs head last night by Jerry....
> > Sorry about that!
> > Don
> >
> > -----Original Message-----
> > From: Claus Svarer [mailto:csvarer at nru.dk]
> > Sent: Wednesday, August 08, 2001 4:20 AM
> > To: MCCALL,DON (HP-USA,ex1)
> > Subject: Re: Problem with locking.tdb
> >
> > Hi Don
> >
> > You are right that after having compiled the new CVS version I get a
"use
> > mmap = No" if I grep the testparm output, so that should be OK. I then
> tried
> > to run it and it also seems to be OK. To test the locking I then tried
to
> > run a smbstatus. From that I get an output as:
> >
> > fog:/users/csvarer 26 : smbstatus | more
> > INFO: Debug class all level = 3   (pid 8529 from pid 8529)
> > Processing section "[homes]"
> > Processing section "[netlogon]"
> > Processing section "[profiles]"
> > Processing section "[printers]"
> > Processing section "[print$]"
> > Processing section "[fax]"
> > Processing section "[tmp]"
> > Processing section "[programs]"
> > Processing section "[prg_data]"
> > Processing section "[f-prot-v5]"
> > Processing section "[n_adm]"
> > Processing section "[nruweb]"
> > Processing section "[brain99]"
> > Processing section "[all]"
> > Failed to open byte range locking database
> > ERROR: Failed to initialise locking database
> >
> > Samba version 2.2.1
> > Service      uid      gid      pid     machine
> > ----------------------------------------------
> > ttsch        ttsch    users     8508   scl3     (130.226.104.223) Wed
Aug
> 8
> > 10:11:30 2001
> > csvarer      csvarer  users     8419   csport   (130.226.104.174) Wed
Aug
> 8
> > 10:10:53 2001
> > f-prot-v5    pc_adm   users     8419   csport   (130.226.104.174) Wed
Aug
> 8
> > 10:10:46 2001
> > n_adm        pia      users     8500   pia      (130.226.104.212) Wed
Aug
> 8
> > 10:11:05 2001
> > dorthe       dorthe   users     8512   dorthe   (130.226.104.213) Wed
Aug
> 8
> > 10:11:36 2001
> > saznar       saznar   users     8413   euler    (130.226.104.172) Wed
Aug
> 8
> > 10:12:03 2001
> > csvarer      csvarer  users     8419   csport   (130.226.104.174) Wed
Aug
> 8
> > 10:10:49 2001
> > programs     khusted  users     8491   khusted  (130.226.104.206) Wed
Aug
> 8
> > 10:11:00 2001
> > brain99      csvarer  users     8419   csport   (130.226.104.174) Wed
Aug
> 8
> > 10:10:52 2001
> > f-prot-v5    saznar   users     8413   euler    (130.226.104.172) Wed
Aug
> 8
> > 10:10:40 2001
> > tmp          csvarer  users     8419   csport   (130.226.104.174) Wed
Aug
> 8
> > 10:10:52 2001
> > programs     csvarer  users     8419   csport   (130.226.104.174) Wed
Aug
> 8
> > 10:10:52 2001
> > prg_data     csvarer  users     8419   csport   (130.226.104.174) Wed
Aug
> 8
> > 10:10:52 2001
> >
> > Can't initialise locking module - exiting
> >
> >
> > I believe this meens that there still must be some problems with the
> locking
> > database? or do you know if it just is an result of the new mmap setup?
> >
> > Claus
> >
> >
> >
> >
> >
> > "MCCALL,DON (HP-USA,ex1)" wrote:
> >
> > Hi Claus,
> > I believe that what Jeremy did was to add a smb.conf parameter use mmap,
> > which for HP should default to no.  So configure still finds that mmap
is
> > existing, and compiles Samba so that it CAN use mmap, but control of
> whether
> >
> > it does or not is by the smb.conf parameter.  That way, when we DO get a
> > unified cache on HPUX, we won't have to change any code, but just turn
on
> > the smb.conf parameter to take advantage of mmap.
> > You can verify this by running testparm after you have built the product
> > and grep for mmap....
> > I haven't gotten the latest cvs installed myself, so let me know...
> > Don
> > -----Original Message-----
> > From: Claus Svarer [ mailto:csvarer at nru.dk <mailto:csvarer at nru.dk> ]
> > Sent: Tuesday, August 07, 2001 11:07 AM
> > To: MCCALL,DON (HP-USA,ex1)
> > Subject: Re: Problem with locking.tdb
> >
> > Hi Don
> >
> > I have now download'e this CVS branch and installed it, called the
> configure
> >
> > script and have now looked into include/config.h. It seems as it still
> > belives
> > it should be using mmap while the two lines:
> >
> > #define HAVE_MMAP 1
> >
> > is still include. Is that correct or should I uncomment them manually?
> >
> > BR
> > Claus
> >
> > "MCCALL,DON (HP-USA,ex1)" wrote:
> >
> > > Hi Claus,
> > > Jeremy and I talked about this, and bottom line, because HP-UX doesn't
> > have
> > > a unified cache between mmapped files and regular files, there is
> > currently
> > > no way to ensure that eventurally a tdb file is not going to be
> 'corrupt'.
> >
> > > So the only way to ensure that this doesn't happen is to turn off mmap
> > > useage
> > > for samba on HP-UX till we get a unified cache sometime in the future.
> > > Please pull down the latest CVS version of SAMBA_2_2 and build that -
> > Jeremy
> > > has checked in fixes that will disable mmap useage on HP-UX, as well
as
> a
> > > few
> > > corner cases where .tdb files got messed up for other reasons.  See if
> you
> >
> > > still
> > > have the same issue after that.
> > > Thanks,
> > > Don
> > > -----Original Message-----
> > > From: Claus Svarer [ mailto:csvarer at nru.dk <mailto:csvarer at nru.dk> ]
> > > Sent: Thursday, August 02, 2001 7:29 AM
> > > To: MCCALL,DON (HP-USA,ex1)
> > > Subject: Re: Problem with locking.tdb
> > >
> > > Hi Don
> > >
> > > Thank you for your help. I believe I should report about our
> > experienceusing
> > > the
> > > new compiled Samba server (with MAP_FIXED) for a week. It has been
> running
> >
> > > without any problems until this morning, but then we got a (from a
> > > smbstatus):
> > >
> > > Samba version 2.2.1a
> > > Service      uid      gid      pid     machine
> > > ----------------------------------------------
> > > n_adm        pia      users    10742   pia      (130.226.104.212) Thu
> Aug
> > 2
> > > 12:
> > >
> > > 27:15 2001
> > > tdb(/usr/local/samba/var/locks/connections.tdb): rec_read bad magic
> > > 0xd9fee666 a
> > >
> > > t offset=54016
> > >
> > > and we have some strange problems connecting to the Samba server (not
> > > surprisingly with an error in the connection database). Restarting
Samba
> > > solves
> > > the problem.
> > >
> > > Do you have any idea of what the reason to this problem can be. Do you
> > know
> > > if
> > > the connection database is created in the same way, although mmap is
not
> > > used?
> > >
> > > Once more thanks for helping solving the first problem. Samba runs
very
> > > nicely
> > > as a Windows2000 domain logon server.
> > >
> > > Best regards
> > > Claus Svarer
> > >
> > > "MCCALL,DON (HP-USA,ex1)" wrote:
> > >
> > > > Hi Claus,
> > > > In fact, I added a number of TDB_LOG statements to tdb.c, where
> tdb_mmap
> >
> > > > (which calls mmap) is located, to try to track this down, and in
fact
> I
> > > >  DO see the issue with mmap periodically returning a pa of 0.  This
is
> > > > NOT (according to the man page) supposed to be possible, in the way
> that
> >
> > > > samba is calling mmap, and in fact samba in several places depends
on
> > this
> > > > not happening - mmap is supposed to return MAP_FAILED in case of a
> > > problem;
> > > > it should never return Null  as a pa, if the addr that you pass to
it
> is
> >
> > > > 0.
> > > > This is why, in my first message (answering your second message - I
> must
> >
> > > > live backwards in time, like Merlin ;-)  just wish I was as smart!)
I
> > ask
> > > > you to try adding the MAP_FIXED flag to the call to mmap() in tdb.c;
> > > > If you don't specify MAP_FIXED, then the pa that is returned is left
> up
> > > > to the mmap code, and it has more leeway as to where to put the
chunk
> of
> >
> > > > memory you are asking for.  Doing this allowed me to consistently
(so
> > far)
> > > > pass the smbtorture LOCK1-4 tests where I was having failures
(LOCK1)
> > > > before; but this is not involving the connections database, but the
> > > several
> > > > locking databases instead.
> > > >
> > > > I don't remember, which version of Samba are you running? 2.2.1a?
> > > >
> > > > So let me know,
> > > > Thanks
> > > > Don
> > > >
> > > > -----Original Message-----
> > > > From: Claus Svarer [ mailto:csvarer at nru.dk <mailto:csvarer at nru.dk> ]
> > > > Sent: Wednesday, July 25, 2001 6:46 AM
> > > > To: MCCALL,DON (HP-USA,ex1)
> > > > Subject: Re: Problem with locking.tdb
> > > >
> > > > Hello Don,
> > > >
> > > > Thanks for your advice, it seems to have solved the problem. At
least
> it
> >
> > > > runs for longer periods now without having any. It has now been
> running
> > > for
> > > > 3 hours witout any problems.
> > > >
> > > > I have once before had two other problems with the HP mmap feature,
I
> > > don't
> > > > know if that could be related to this problem.
> > > >
> > > > Problem 1:
> > > > The first problem I solved by changing the mmap call from MAP_SHARED
> to
> > > > MAP_PRIVATE. I know that it also change the behaviour of mmap but
that
> > did
> > > > not matter in that program, because in the program the mmap feature
> was
> > > not
> > > > used for changing the contents in the files only for reading it.
That
> > > > program simply didn't work with the call of mmap with the MAP_SHARED
> > > > feature, I was not able to track down why.
> > > >
> > > > Problem 2:
> > > > In the same program mmap was used for mapping several different
memory
> > > > segments within the same file. If more than about 34 memory regions
> was
> > > > allocated within the same file, a bit dependent of the size of the
> file
> > > and
> > > > the memory chunks (image file with a size of approx. 48 Mbyte with
1.1
> > > Mbyte
> > > > memory chunks), it suddently began to return zero (as I remember it,
> it
> > is
> > > > several years ago and with HP-UX 10.20)
> > > >
> > > > Do you have any idea about if any of these problems is related to
what
> > we
> > > > see now for mmap and the tdb feature?
> > > >
> > > > Best regards
> > > > Claus Svarer
> > > >
> > > > PS There is one thing that I wonder a bit about. I asked about a
> similar
> >
> > > > problem with vers 2.2.0 two months ago, and there I got several
emails
> > > back
> > > > that told that they have no problem with the Samba version and
HP-UX.
> Do
> >
> > > you
> > > > have any idea if any of the kernel parameters could influence the
> > > behaviour
> > > > of the mmap function (also related to the second problem I am
> > describing)?
> > > >
> > > > "MCCALL,DON (HP-USA,ex1)" wrote:
> > > >
> > > > > Hello Claus,
> > > > > We are seeing this as well.  running the smbtorture test LOCK1
> > > > > will show this (and other issues as well).  I'm not done with
> > > > > my investigation, but at the moment, it APPEARS to be an issue
> > > > > with the mmap implementation on HP-UX, and the fact that HP-UX
> > > > > uses different caches for memory mapped file access as opposed
> > > > > to filesystem (read/write) access to files.  Samba 'fails thru'
> > > > > to read/write when it detects that the tdb->map_ptr value is
> > > > > zero for tdb_write and tdb_read, but some other tdb calls (I
> > > > > THINK) are still accessing via the mmapped address.  According to
> > > > > the man page for mmap, mmap should never map to 0, but additional
> > > > > debug statements I have added to the code indicate that in some
> > > > > circumstances, IT IS doing this.  Until we
> > > > > get this resolved, you can probably workaround this problem
> > > > > by changing the lines (there are 2 of them):
> > > > >
> > > > > #define HAVE_MMAP 1
> > > > >
> > > > > in the include/config.h file
> > > > >
> > > > > and then doing a "make clean",
> > > > > and then a "make"
> > > > > to rebuild samba WITHOUT mmap support.
> > > > >
> > > > > Let me know if this doesn't do it for you.
> > > > > Don
> > > > >
> > > > > -----Original Message-----
> > > > > From: Claus Svarer [ mailto:csvarer at nru.dk <mailto:csvarer at nru.dk>
]
> > > > > Sent: Tuesday, July 24, 2001 7:22 AM
> > > > > To: samba-technical at lists.samba.org
> > > > > Subject: Problem with locking.tdb
> > > > >
> > > > > Hi
> > > > >
> > > > > I have just download'ed and compiled the new Samba release 2.2.1a
at
> a
> >
> > > > > HP9000 system running HP-UX 11.00. The smb and nmb starts without
> any
> > > > > problems but after running for some time we get error messages
like,
> > > > > running the smbstatus command:
> > > > >
> > > > > tdb(/usr/local/samba/var/locks/locking.tdb): rec_read bad magic
> > > > > 0xd9fee666 at offset=5596
> > > > > locked file list truncated
> > > > >
> > > > > Restarting Samba solves the problem but it consistently returns
> again
> > > > > when smbd has be running for some time. (Some time it is in other
of
> > the
> > > > > tdb files that is reported as problematic e.g. connections.tdb)
> > > > >
> > > > > Does anyone know about a solution for this problem, while I can
see
> > that
> > > > > other people reports other problem with the new locking method
using
> > tdb
> > > > > files (we previously used Samba 2.0.7 which doesn't cause this
kind
> of
> >
> > > > > problems)?
> > > > >
> > > > > Best regards
> > > > > Claus Svarer




More information about the samba-technical mailing list