dbwrap_record_watch_send/recv

Wed Feb 15 08:59:41 MST 2012

Hi!

Under

http://git.samba.org/?p=vl/samba.git/.git;a=shortlog;h=refs/heads/dbwrap_record_watch

find a patchset that I've been working on for a while now.

It implements the following API:

struct tevent_req *dbwrap_record_watch_send(TALLOC_CTX *mem_ctx,
                                            struct tevent_context *ev,
                                            struct db_record *rec,
                                            struct messaging_context *msg);
NTSTATUS dbwrap_record_watch_recv(struct tevent_req *req,
                                  TALLOC_CTX *mem_ctx,
                                  struct db_record **prec);

The central idea is that you can asynchronously wait for a
dbwrap based tdb record to change. The top commit in the git
branch explains it a bit why I've done this:

> This simplifies the g_lock implementation. The new
> implementation tries to acquire a lock. If that fails due
> to a lock conflict, wait for the g_lock record to change.
> Upon change, just try again. The old logic had to cope
> with pending records and an ugly hack into ctdb itself. As
> a bonus, we now get a really clean async
> g_lock_lock_send/recv that can asynchronously wait for a
> global lock. This would have been almost impossible to do
> without the dbwrap_record_watch infrastructure.

Just for the g_lock implementation it would not have been
worth the trouble to implement that API and the
infrastructure around it, but if you look at our share mode
an oplock implementation, a lot of the custom smbd messages
can be replaced by the new API. For example we have a
special message to inform a second opener about an oplock
being released. This can be simplified by sending a message
to the oplock holder and then watching the record to change.
After a change, just retry. Same holds true for timed byte
range locks and a few others.

What fell out of this work is the start of a reworked
messaging API, we now have msg_read_send/recv, a tevent_req
based version of messaging_register.

One consequence of using this API throughout smbd would be a
vastly improved cleanup behaviour after a crashed smbd.
Right now we have custom code to periodically walk the
brlock database. We do not have code to walk locking.tdb,
for good reason. You just don't want to traverse a database
of 100.000 open files when maybe 100 of those are waiting
for an oplock break. By using the dbwrap_watchers.tdb for
everyone waiting for a change, it becomes much more feasible
to walk this whole db and wake up all waiters whenever an
smbd dies or a node goes down.

This patchset is not perfect yet: One example piece missing
is proper cleanup right now. The code does not yet clean up
stale entries when a waiter dies hard.

Comments?

Volker

-- 
SerNet GmbH, Bahnhofsallee 1b, 37081 Göttingen
phone: +49-551-370000-0, fax: +49-551-370000-9
AG Göttingen, HRB 2816, GF: Dr. Johannes Loxen
http://www.sernet.de, mailto:kontakt at sernet.de