Blocksizes and TDB.

Ira Cooper samba at ira.wakeful.net
Tue Dec 20 09:21:43 MST 2011


On Tue, Dec 20, 2011 at 2:29 AM, Rusty Russell <rusty at rustcorp.com.au>wrote:

> On Mon, 19 Dec 2011 20:48:23 -0500, Ira Cooper <ira at wakeful.net> wrote:
> Non-text part: multipart/alternative
> > On Mon, Dec 19, 2011 at 8:41 PM, Rusty Russell <rusty at rustcorp.com.au
> >wrote:
> >
> > > On Mon, 19 Dec 2011 19:19:43 -0500, Ira Cooper <ira at wakeful.net>
> wrote:
> > > Non-text part: multipart/alternative
> > > > On Mon, Dec 19, 2011 at 5:39 PM, Rusty Russell <
> rusty at rustcorp.com.au
> > > >wrote:
> > > > > I think I prefer to coalesce records on expand.  It's fairly easy
> to
> > > do;
> > > > > sweep the database and try to merge.
> > > >
> > > > It does, the issue I suspect it stops after checking 50 records, if I
> > > > understand the code.  If you get fragmentation sufficient to make
> those
> > > > records useless... it's toast, is my suspicion.
> > >
> > > Hmm, I don't see any such thing.  Am I looking in the wrong place?
> > >
> > > We repack if we expand in a transaction, but we don't do anything for
> > > non-transaction expands.
> > >
> > > Does the locking tdb do this inside a transaction?  In which case,
> > > perhaps I introduced a bug in 094ab60053bcc0bc3542af8144e394d83270053e
> > > back in April?
> >
> > I don't think the code uses any transactions, but the code running in
> prod
> > is actually pre-3.6.
> >
> > So.. I don't think that's the issue, unless that is a long standing
> issue.
>
> Amitay reported a similar bug.  An ldb passed 4G and stuff broke, yet it
> was mainly free space.  We watched it happen again, and it was caused by
> the recovery area expansion.


All of this was done to a copy of locking.tdb, do NOT run tdbtool against a
tdb being used, for those playing along at home.

# ls -la locking.tdb
-rw-r--r-- 1 root root 2294702080 Dec 20 10:32 locking.tdb

# echo free | tdbtool locking.tdb  | wc -l
2080011

2 million element free list.

# echo list | tdbtool locking.tdb | grep magic | grep magic=0x26011999 | wc
-l
7274

7k records in use, on a database using no transactions.  I doubt I'm
looking at the same issue.  The machine has been up 6 months now.  So we'll
see how it does.

I'll start looking at it more carefully.  But the suggestion to use
blocksizes was so the allocator would have an easier job, and decrease
fragmentation over time.  (It will still happen, and once it takes root, it
VERY difficult if not impossible to get rid of in the current code.)

Note: connections.tdb has the exact same issue, and a similar use
profile... (Well, somewhat.)

I'll work on it, and see what I can find.

Thanks,

-Ira


More information about the samba-technical mailing list