RFC: dbwrap_ctdb and empty vs deleted records

Fri Jul 22 21:49:34 UTC 2016

On Fri, Jul 22, 2016 at 04:42:35PM +0200, Michael Adam wrote:
> On 2016-07-22 at 14:59 +0200, Ralph Böhme wrote:
> > On Fri, Jul 22, 2016 at 02:45:51PM +0200, Volker Lendecke wrote:
> > > On Wed, Jul 20, 2016 at 03:14:02PM +0200, Ralph Böhme wrote:
> > > > On Mon, Jul 04, 2016 at 01:43:00PM +0200, Ralph Boehme wrote:
> > > > > I *think* my patch might be a proper fix without the risk of a
> > > > > deadlock, because it *won't* call out to ctdb but return ENOENT (im
> > > > > terms of NTSTATUS).
> > > > > 
> > > > > I'd highly appreciate some feedback. In case we don't want to take the
> > > > > risk of this change, I'll prepare a patch for parse_share_modes() and
> > > > > callers.
> > > > 
> > > > *ping*
> > > 
> > > I like your patch. Samba can live without empty records, and your patch
> > > solves this really bad problem. However, and I don't want to block it just
> > > for that, reading 925625b52886d40b50fc's commit message this deadlock
> > > came as a bad surprise. Do we have sufficient information to reproduce
> > > that deadlock, just to make sure with your patch this does not happen?
> > 
> > yes, that would be really helpfull if anyone who worked on
> > 925625b52886d40b50fc would remember how this could be reproduced.
> > 
> > Michael? Björn?
> 
> I have to admit that I don't fully remember.
> Well, it's 2.5 years ago... ;-)

bitrot or what? :)

> Thinking aloud:
> 
> The example in the commit msg is from brlock code.
> We'd need to lock a file, release it, have an
> empty brl record, and before it gets vacuumed
> call do_lock on the file again.
> 
> So possibly by a long vacuum interval and a specially
> crafted sequence of file ops...

hm, my reading of the commit message of the revert was a bit
different:

    do_lock()
      -> grabs lock on brl record with brl_get_locks()
        -> calls brl_lock()
          -> calls brl_lock_posix or _windows_default()
            -> calls contend_level2_oplocks_begin()
              -> calls brl_locks_get_read_only()
                -> calls dbwrap_parse_record on the same brl record as above

This suggest that this can be triggered by a single request calling
into do_lock(). Can it?

Cheerio!
-slow