talloc: Other minor issues/queries

Thu Oct 22 08:57:36 UTC 2020

Am 22.10.20 um 03:08 schrieb Jeremy Allison via samba-technical:
> On Wed, Oct 21, 2020 at 06:51:04PM -0500, Arran Cudbard-Bell wrote:
>>
>> Would references would be on the chopping block for a new major
>> release also? I have to say that's one of the features we also use
>> but one we could easily replicate internally and wouldn't really mourn.
> 
> I would *love* to get rid of references. I once had a long
> conversation with Josh Bloch:
> 
> https://en.wikipedia.org/wiki/Joshua_Bloch
> 
> who used to be an office-mate of mine (one of
> the benefits of working for Google :-). As he's
> an expert in API design I ran the talloc API
> past him.
> 
> He said it was mostly OK but *hated* talloc_reference() :-).
> 
> Unfortunately we'd need to get rid of it from Samba
> first. Something I've been working (slowly:-) towards
> for years...
> 
>> Now we're discussing major revs, there are a couple of other issues 
>> we've run into that it'd be good to get comment on.
>>
>> First, is that the lack of parent pointer for the majority of chunks makes
>> any call that needs to access the parent ctx, very slow, when there 
>> are a large number of chunks at the same level.
>>
>> I'm assuming it's a trade off between cheap reallocs and cheap
>> talloc_steal/talloc_parent_*.  Is allocing a bunch of chunks off of a 
>> context that gets realloced a common pattern for Samba? Am I 
>> missing something else that makes recording the parent in each chunk
>> painful?
> 
> It's probably a size optimization to avoid bloating out
> the talloc chunk header more than we need.

We always have a the parent pointer, but it's left as NULL
for all but the first child.

Currently the chunk header is 88 bytes (padded to a 16 byte boundary => 96 bytes)
on 64bit architectures, which is more than the typical cacheline size of 64 bytes.

Removing the memlimit pointer would shrink it to 80 bytes, but I guess it's to hard to
make it fit into 64 bytes...

There's also an alignment gad of 4 bytes after 'unsigned flags',
so we're currently wasting 12 bytes for each chunk.

It means that we could change flags to uint64_t or split of the magic into
a separate uint32_t. Both would allow expanding the flags mask from 0xF to
more, so we could have flags regarding the a const name, or it a chunk
was realloc'ed.

We could start with filling the parent pointer on each child first
on the first realloc of the parent we'll mark the parent with FLAG_REALLOCED
and the children with FLAG_INDIRECT_PARENT.

> We tend not to reparent or realloc much. Just allocate
> off a context and children and then free it.

Actually I think in some parts we reparent (talloc_steal/talloc_move) a lot
in others (e.g. ldb or ndr) we realloc a bit more.

>> I mention this because we had to strip out almost all calls to
>> talloc_steal and talloc_parent*, because they featured so heavily
>> in code profiling.  If it really is just a tradeoff between realloc and
>> talloc_parent* it might be good to revisit how parents are 
>> recorded.
>>
>> Second, is expandable pools.
>>
>> In FreeRADIUS the instance data for modules is only protected 
>> when a user sets a special environmental variable to specify the pool
>> size for all instance data memory. The user has to guess at how 
>> big this pool needs to be (which is pretty terrible) but it's the only way 
>> to handle this with the current version of talloc.
>>
>> When we're building up the dictionaries we have a similar problem
>> in that we need to guess at how big the pool needs to be because
>> we can't expand it.  This leaves the pool massively oversized in the
>> majority of cases.
> 
> Pools are really designed to be used for *known* data sizes.
> 
> Look at how we use them for filename in cp_smb_filename()
> via talloc_pooled_object().
> 
> Remember, talloc is our internal library that others found
> useful, so we design it around our needs first.
> 
> Selfish, but that's just the way it is :-).
> 
>> Having talloc automatically expand pools would be amazing 
>> (imagining a linked list of fixed size malloced chunks).  I know there's
>> additional complexity here, but it does seem to be limited to very
>> specific areas of the code.
>> Would this functionality ever be considered?
> 
> Be very clear about what you want this to look like.
> 
> Are pools the right thing ?

Maybe we could have a FLAG_FIXED_POOL, which avoids the fallback to head memory.

But I guess we'd need something implemented a bit differently,

Currently we're very bad at reusing free memory within pools, we don't maintain a free list.
But that's ok for our two main use cases talloc_stackframe_pool()
and talloc_pooled_object() for both our main goal is to avoid calling malloc/free more than needed.
But they don't really implement a SLAB allocator.
https://en.wikipedia.org/wiki/Slab_allocation

The Linux kernel has a separation of 'struct kmem_cache' and 'mempool_t', see:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/slab.h
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/slab_def.h
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/mempool.h

So there's 'mempool_t *mempool_create_slab_pool(int min_nr, struct kmem_cache *kc)'
and 'mempool_t *mempool_create_kmalloc_pool(int min_nr, size_t size)', but that's all
for fixed size objects.

Maybe we could abstract the low level allocation (malloc,realloc,free) in talloc and just
implement the other allocation schemes below the main talloc logic.

In order to be able to handle much higher IOPS in our SMB server, we need to find ways
to optimize our talloc/tevent_req handling.

These are some ideas, but it will be a long road to get there, so I guess we need to
find incremental mini steps...

metze

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20201022/e879ee68/signature.sig>