[PATCH] LMDB full patch set

Mon Apr 16 15:57:34 UTC 2018

Stefan Metzmacher wrote:
> Am 16.04.2018 um 16:43 schrieb Howard Chu via samba-technical:
>> Simo wrote:
>>> On Mon, 2018-04-16 at 14:36 +0200, Volker Lendecke via samba-technical
>>> wrote:
>>>> On Fri, Apr 13, 2018 at 07:00:35AM +1200, Andrew Bartlett via
>>>> samba-technical wrote:
>>>>> It seems so, and there isn't much we can do.  If you think about a
>>>>> fork() based memory model then cleaning up the internal state will only
>>>>> dirty memory anyway.  They don't provide a 'clean up after fork()'
>>>>> function, but do strictly require that you not touch an environment
>>>>> after a fork().
>>>>
>>>> Asked Howard directly about fork handling. His reply contains:
>>>>
>>>>> After discussing this on -devel IRC, I think we could add an
>>>>> mdb_env_postfork() call that the child can use to make its copy of
>>>>> the env valid. It would have the same restrictions as
>>>>> mdb_env_set_mapsize() - there must not be any active txns in the
>>>>> parent process at the time of fork - they'd just be memleaks in the
>>>>> child. The child must call this immediately, and cannot call any
>>>>> other LMDB APIs until this is done. But assuming those conditions
>>>>> are met, we can continue to use the existing file descriptors and
>>>>> mmaps without needing to tear down and reopen.
>>>>
>>>> Would that help us?
>>>
>>> It would require to gate forks on the fact no transactions are open.
>>> Can we guarantee no transactions in parents or block forks?
>>>
>>> Note: If we need to fork a utility within the transaction we do not
>>> need to gate it as long as we have call semantics that enforce an exec
>>> or a panic I guess.
>>
>> Yes, this is missing a little prior context, in my earlier reply to Volker:
>>
>>>> One question that came up: What to do
>>>> with an open lmdb after fork? Samba does fork a lot, and with tdb we
>>>> just reopen the fd and restart everything from scratch. What's the
>>>> advice for lmdb?
>>>
>>> Does the parent process continue to use the DB after the fork?
>>> If the child is just going to exec, there's not much needed. You will
>>> leak a descriptor unless you close mdb_env_get_fd() before exec'ing.
> 
> Why isn't *CLOEXEC used for env->me_fd and env->me_mfd ?
> For me_fd we could set it ourself using mdb_env_get_fd(), but
> me_mfd is completely hidden.

CLOEXEC is set on me_mfd already. It is deliberately not set on me_fd since 
that's been exposed for users to manipulate.

>>> If both processes are using the DB, you will need to env_close before
>>> forking and reopen in both processes. Otherwise you'll mess up the
>>> fcntl lock on the lockfile.
> 
> With tdb we just need to close and reopen in the child, why is that a
> problem with fcntl locking?

I misspoke. You will mess up the lockfile, but not because of its fcntl lock.
>>> If you don't close in the parent, and only open in the child, things
>>> will work, but the child will leak the entire copy of the parent's
>>> environment.
> 
> Without a pending transaction I'd guess mdb_env_close() would be all we
> need in the child. But what would be the problems with that?

mdb_env_close() will reset any reader slots owned by the process. The PID is 
stored in env->me_pid so unfortunately both parent and child will have the 
same PID recorded here after a fork. Thus env_close in the child would end up 
releasing the parent's reader slots. Fairly disastrous. Obviously we can fix 
this trivially by just using getpid() in env_close().

>> I wrote the above before we had the discussion about adding a
>> _postfork() API. If you don't care about leaking the memory from any
>> active transactions, you could ignore that detail as well.
>>
>> There's also the case where you fork and then the parent exits, but our
>> proposed _postfork() would handle that as well.
> 
> What would the _postfork() do in detail compaired to mdb_env_close()?

Mainly it would reacquire the lockfile fcntl lock. There may be other cleanup 
needed, depends on the platform and which flavor of mutexes was used. Just to 
be clear, the only reason I suggested adding this _postfork() function is to 
allow the child process to keep using the existing env. If you just want to 
close the env() then there's really not much else needed.

-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/