[PATCH] GUID index for LDB

Andrew Bartlett abartlet at samba.org
Fri Sep 8 10:30:25 UTC 2017

On Fri, 2017-09-08 at 11:31 +0200, Stefan Metzmacher wrote:
> Am 08.09.2017 um 11:07 schrieb Andrew Bartlett:
> > 
> > I understand your fears, but to be clear, a database of 100,000 users
> > was only 860MB with the new code.  
> My concern is about large database, which gets expanded by the
> transaction, which means it will be more fragmented and all
> of the new records are appended at the end and the old space
> moves to the freelist.

A very reasonable concern.  Is there a normal strategy to avoid this

> How does the reindexing code operates in detail?
> Were can I find the related code?


First, remember that we keep an in-memory internal TDB for the duration
of the transaction holding the in-memory copy of the index. 

Unchanged from master (except in the selection of the key) we:

 - Delete the entries from the internal TDB
 - Add empty-record values into the internal TDB for each index found
in the LDB
 - Traverse and re-key the database, 
   - for each entry if the key has changed:
     - remove record
     - add record under new key
 - Traverse and add/update index entries to/in the internal TDB
 - prepare_commit the transaction (the caller does this)
   - ltdb_index_transaction_commit() then writes out the index values
(added, deleted, modified) from the internal TDB into the LDB
   - for each entry in the internal tdb:
     - tdb_store() with TDB_REPLACE is used for a new or updated index
into LDB
     - tdb_delete is used to delete a now empty index into the LDB

> Do we delete all index related keys first?

See above.  The index keys in LDB are written out during the
transaction prepare_commit. 

In the general case most index values (and so data size) won't change,
but they will for this upgrade (shrinking).  Given that, should we wipe
all the index keys from the LDB at the start of the reindex?  

> Do we delete an old DN= based record before storing it under
> the GUID= key?

Yes, they go one-in-on-out

> And then rebuild the indexes?


> > In terms of existing deployment scale, the indeed network that Kevin
> > presented at SambaXP is the largest production deployment I know of and
> > before Samba 4.7 significantly larger deployments are un-feasible due
> > to the O(n^2) handling on links at join time.  (They take him ~ 30-45
> > minutes).
> > 
> > * 45 domain controllers on 5 continents
> > * 6,252 User/Group objects , 
> > * 37,625 group memberships
> Is this in total or 45 mins per dc?

I understand that is per DC.  That is why we did all the linked
attribute sorting work during Dec 2016/Jan 2017.

> > > I think I'd prefer making the switch for existing databases an
> > > explicit task for the admin.
> > 
> > I understand the concern, and I changed the index re-write code not to
> > force the set of DSDB_FEATURES_SUPPORTED flag for 4.7 for this reason,
> > meaning only new databases support 'features'.  A similar arrangement
> > could be made if needed.
> > 
> > The reverse concern I have is that if we do that, we have to maintain
> > and test Samba in both modes in perpetuity, particularly as we start to
> > structure our code to try to be GUID rather than string DN based.
> > 
> > Either way, it is Samba (not LDB) that controls when this is enabled,
> > and this and the auto-upgrade is only proposed to be enabled on a major
> > version upgrade, for Samba 4.8.  I will naturally make any change we
> > make here very clear in the WHATSNEW.
> Ok, I think it would include some numbers.
> We may also need to document a procedure that avoids doing
> the upgrade by starting samba. I guess a good strategy might
> be disabling samba and use "samba-tool dbcheck --cross-ncs"
> to trigger an offline upgrade. That would hopefully avoid
> problems for clients which may talk to an unusable dc.

That is a good idea.

> > Most installations (from the discussions on the list) upgrade by
> > joining a new DC to the domain.  The few that don't just upgrade one DC
> > in a pair, then the other.  We hear about these because folks regularly
> > ask if the versions need to be upgraded in sync. 
> > 
> > Garming keeps a 40,000 user test DB around, so I'll get those numbers
> > from that and hope the above helps address your concerns.  
> Please backup the files before the upgrade, so we can redo it if needed.
> What are the sizes before the upgrade?
> Please collect "tdbtool $file info" for all files,
> before and after the upgrade.


> Please also watch top for the memory consumption.


Thanks for all the feedback!

Andrew Bartlett
Andrew Bartlett                       http://samba.org/~abartlet/
Authentication Developer, Samba Team  http://samba.org
Samba Developer, Catalyst IT          http://catalyst.net.nz/services/samba

More information about the samba-technical mailing list