[PATCH] GUID index for LDB

Stefan Metzmacher metze at samba.org
Fri Sep 8 09:31:45 UTC 2017

Am 08.09.2017 um 11:07 schrieb Andrew Bartlett:
> On Fri, 2017-09-08 at 10:36 +0200, Stefan Metzmacher wrote:
>> Am 08.09.2017 um 05:56 schrieb Andrew Bartlett:
>>> Control points for choice of index mode
>>> ---------------------------------------
>>> The choice of index and TDB key mode is made based (for example, from
>>> Samba) on entries in the @INDEXLIST DN:
>>> dn: @INDEXLIST
>>> @IDXGUID: objectGUID
>>> By default, the original DN format is used.
>> So we're upgrading the database on first use with the new code?
> Yes, when Samba 'fixes' the @INDEXLIST on schema load this will be
> triggered. 
>> My fear with this is that a simple package upgrade will make
>> a dc with a large database unusable for quite some time.
>> Can you please check the cost of an upgrade for databases with
>> 1.) 5000 users, 5000 computers and 5000 groups
>> 2.) 20000 users, 20000 computers and 20000 groups
>> 3.) with the numbers of the largest known customer size
>> I guess rewriting the whole database consumes quite some cpu
>> and also memory. A server may run out of memory while doing this
>> as we need more than twice the size of all sam.ldb* databases together.
> I understand your fears, but to be clear, a database of 100,000 users
> was only 860MB with the new code.  

My concern is about large database, which gets expanded by the
transaction, which means it will be more fragmented and all
of the new records are appended at the end and the old space
moves to the freelist.

How does the reindexing code operates in detail?
Were can I find the related code?

Do we delete all index related keys first?

Do we delete an old DN= based record before storing it under
the GUID= key?

And then rebuild the indexes?

> In terms of existing deployment scale, the indeed network that Kevin
> presented at SambaXP is the largest production deployment I know of and
> before Samba 4.7 significantly larger deployments are un-feasible due
> to the O(n^2) handling on links at join time.  (They take him ~ 30-45
> minutes).
> * 45 domain controllers on 5 continents
> * 6,252 User/Group objects , 
> * 37,625 group memberships

Is this in total or 45 mins per dc?

>> I think I'd prefer making the switch for existing databases an
>> explicit task for the admin.
> I understand the concern, and I changed the index re-write code not to
> force the set of DSDB_FEATURES_SUPPORTED flag for 4.7 for this reason,
> meaning only new databases support 'features'.  A similar arrangement
> could be made if needed.
> The reverse concern I have is that if we do that, we have to maintain
> and test Samba in both modes in perpetuity, particularly as we start to
> structure our code to try to be GUID rather than string DN based.
> Either way, it is Samba (not LDB) that controls when this is enabled,
> and this and the auto-upgrade is only proposed to be enabled on a major
> version upgrade, for Samba 4.8.  I will naturally make any change we
> make here very clear in the WHATSNEW.

Ok, I think it would include some numbers.

We may also need to document a procedure that avoids doing
the upgrade by starting samba. I guess a good strategy might
be disabling samba and use "samba-tool dbcheck --cross-ncs"
to trigger an offline upgrade. That would hopefully avoid
problems for clients which may talk to an unusable dc.

> Most installations (from the discussions on the list) upgrade by
> joining a new DC to the domain.  The few that don't just upgrade one DC
> in a pair, then the other.  We hear about these because folks regularly
> ask if the versions need to be upgraded in sync. 
> Garming keeps a 40,000 user test DB around, so I'll get those numbers
> from that and hope the above helps address your concerns.  

Please backup the files before the upgrade, so we can redo it if needed.

What are the sizes before the upgrade?
Please collect "tdbtool $file info" for all files,
before and after the upgrade.

Please also watch top for the memory consumption.

> Thankfully, the new code (actually everything since 1.2.2) is much
> faster to re-index than the old, as I removed an O(n^2) loop. 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20170908/e744fe8e/signature.sig>

More information about the samba-technical mailing list