Still some loose ends after commit ba53e284e68

Jeremy Allison jra at samba.org
Fri Jan 10 23:34:10 UTC 2020


On Fri, Jan 10, 2020 at 10:46:54PM +0000, Christopher O Cowan - Christopher.O.Cowan--- via samba-technical wrote:
> I still getting some assertions pointing to rec->value_valid after compiling with the patch.   Seems to be when I'm attempting to start smbd.
> 
> I've attached the log from running smbd -i -d10, and a dbx stack trace.
> It appears that there additional callbacks that need to be cleaned up.
..
> Type 'help' for help.
> [using memory image in /tmp/core.6488334.10223352]
> reading symbolic information ...
> 
> IOT/Abort trap in pthread_kill at 0xd054bb94 ($t1)
> 0xd054bb94 (pthread_kill+0xb4) 80410014            lwz   r2,0x14(r1)
> pthread_kill(??, ??) at 0xd054bb94
> _p_raise(??) at 0xd054afe4
> raise.raise(??) at 0xd0121020
> abort() at 0xd017cae4
> dump_core(), line 338 in "dumpcore.c"
> smb_panic_s3(why = "assert failed: rec->value_valid"), line 853 in "util.c"
> smb_panic(why = "assert failed: rec->value_valid"), line 174 in "fault.c"
> dbwrap_record_get_value(rec = 0x2016e5f8), line 82 in "dbwrap.c"
> regdb_upgrade_v2_to_v3_fn(rec = 0x2016e5f8, private_data = 0x2016de28), line 617 in "reg_backend_db.c"
> traverse_persistent_callback(tdb = 0x2016e1f8, kbuf = (...), dbuf = (...), private_data = 0x2ff21fd8), line 1581 in "dbwrap_ctdb.c"
> unnamed block in tdb_traverse_internal(tdb = 0x2016e1f8, fn = 0xf0570044, private_data = 0x2ff21fd8, tl = 0x2ff21f70), line 222 in "traverse.c"
> tdb_traverse_internal(tdb = 0x2016e1f8, fn = 0xf0570044, private_data = 0x2ff21fd8, tl = 0x2ff21f70), line 222 in "traverse.c"
> tdb_traverse(tdb = 0x2016e1f8, fn = 0xf0570044, private_data = 0x2ff21fd8), line 295 in "traverse.c"
> unnamed block in db_ctdb_traverse(db = 0x2016de28, fn = 0x2004d1f8, private_data = 0x2016de28), line 1647 in "dbwrap_ctdb.c"
> db_ctdb_traverse(db = 0x2016de28, fn = 0x2004d1f8, private_data = 0x2016de28), line 1647 in "dbwrap_ctdb.c"
> dbwrap_traverse(db = 0x2016de28, f = 0x2004d1f8, private_data = 0x2016de28, count = (nil)), line 377 in "dbwrap.c"
> regdb_upgrade_v2_to_v3(db = 0x2016de28), line 706 in "reg_backend_db.c"
> regdb_init(), line 829 in "reg_backend_db.c"
> registry_init_common(), line 33 in "reg_init_basic.c"
> registry_init_full(), line 81 in "reg_init_full.c"
> main(argc = 3, argv = 0x2ff2255c), line 2020 in "server.c"

This backtrace is an interesting one.

It's going through traverse_persistent_callback(), which looks like:

1560 static int traverse_persistent_callback(TDB_CONTEXT *tdb, TDB_DATA kbuf, TDB_DATA dbuf,
1561                                         void *private_data)
1562 {
1563         struct traverse_state *state = (struct traverse_state *)private_data;
1564         struct db_record *rec;
1565         TALLOC_CTX *tmp_ctx = talloc_new(state->db);
1566         int ret = 0;
1567 
1568         /*
1569          * Skip the __db_sequence_number__ key:
1570          * This is used for persistent transactions internally.
1571          */
1572         if (kbuf.dsize == strlen(CTDB_DB_SEQNUM_KEY) + 1 &&
1573             strcmp((const char*)kbuf.dptr, CTDB_DB_SEQNUM_KEY) == 0)
1574         {
1575                 goto done;
1576         }
1577 
1578         /* we have to give them a locked record to prevent races */
1579         rec = db_ctdb_fetch_locked(state->db, tmp_ctx, kbuf);
1580         if (rec && rec->value.dsize > 0) {
1581                 ret = state->fn(rec, state->private_data);

                           ^^^^^^^^^
                           Dies here.

rec comes from db_ctdb_fetch_locked(). If we look at this we see:

1257 static struct db_record *db_ctdb_fetch_locked(struct db_context *db,
1258                                               TALLOC_CTX *mem_ctx,
1259                                               TDB_DATA key)
1260 {
1261         struct db_ctdb_ctx *ctx = talloc_get_type_abort(db->private_data,
1262                                                         struct db_ctdb_ctx);
1263 

Case #1

1264         if (ctx->transaction != NULL) {
1265                 return db_ctdb_fetch_locked_transaction(ctx, mem_ctx, key);
1266         }
1267 

Case #2

1268         if (db->persistent) {
1269                 return db_ctdb_fetch_locked_persistent(ctx, mem_ctx, key);
1270         }
1271 

Case #3

1272         return fetch_locked_internal(ctx, mem_ctx, key, false);
1273 }

So rec can come from the returns of

case #1 db_ctdb_fetch_locked_transaction()
case #2 db_ctdb_fetch_locked_persistent()
case #3 fetch_locked_internal().

Examining each in turn:

Case #1
--------------------------------------------------
db_ctdb_fetch_locked_transaction() has:

 557         result->value_valid = true;
 558 
 559         SAFE_FREE(ctdb_data.dptr);
 560 
 561         return result;

the only other return is:

 542                 /* create the record */
 543                 result->value = tdb_null;
 544                 return result;

which would return a rec with rec->value.dsize == 0
and thus not call state->fn() due to:

 1580         if (rec && rec->value.dsize > 0) {
 1581                 ret = state->fn(rec, state->private_data);


Case #2
--------------------------------------------------

db_ctdb_fetch_locked_persistent() has:

 591         rec = db_ctdb_fetch_locked_transaction(ctx, mem_ctx, key);
...
 606         return rec;

so should have the same 'result->value_valid = true' set as it comes
from db_ctdb_fetch_locked_transaction() (case #1 above).

Case #3
--------------------------------------------------

fetch_locked_internal() is the most complex, but tries
to migrate the record and once it's fetched does:

1241         if (result->value.dsize != 0) {
1242                 result->value.dptr = talloc_memdup(
1243                         result, ctdb_data.dptr + sizeof(crec->header),
1244                         result->value.dsize);
1245                 if (result->value.dptr == NULL) {
1246                         DBG_ERR("talloc failed\n");
1247                         TALLOC_FREE(result);
1248                 }
1249         }
1250         result->value_valid = true;
1251 
1252         SAFE_FREE(ctdb_data.dptr);
1253 
1254         return result;

which again ends up with the only way to get
a returned record with rec->value.dsize > 0
going through a code path that sets 'result->value_valid = true'.

So currently I don't see how you can get there
from here (if you see what I mean :-) ?

Is this crash reproducible ? If so, can you
try adding some debug level 0 statements to the different
cases in #1, #2, #3 above so I can see which path
it went through and in that how it ended up returning
a rec value with value_valid = false ?

There must be a way this is happening but currently
I can't see it :-(.

Cheers,

Jeremy.



More information about the samba-technical mailing list