Still some loose ends after commit ba53e284e68
Jeremy Allison
jra at samba.org
Fri Jan 10 23:34:10 UTC 2020
On Fri, Jan 10, 2020 at 10:46:54PM +0000, Christopher O Cowan - Christopher.O.Cowan--- via samba-technical wrote:
> I still getting some assertions pointing to rec->value_valid after compiling with the patch. Seems to be when I'm attempting to start smbd.
>
> I've attached the log from running smbd -i -d10, and a dbx stack trace.
> It appears that there additional callbacks that need to be cleaned up.
..
> Type 'help' for help.
> [using memory image in /tmp/core.6488334.10223352]
> reading symbolic information ...
>
> IOT/Abort trap in pthread_kill at 0xd054bb94 ($t1)
> 0xd054bb94 (pthread_kill+0xb4) 80410014 lwz r2,0x14(r1)
> pthread_kill(??, ??) at 0xd054bb94
> _p_raise(??) at 0xd054afe4
> raise.raise(??) at 0xd0121020
> abort() at 0xd017cae4
> dump_core(), line 338 in "dumpcore.c"
> smb_panic_s3(why = "assert failed: rec->value_valid"), line 853 in "util.c"
> smb_panic(why = "assert failed: rec->value_valid"), line 174 in "fault.c"
> dbwrap_record_get_value(rec = 0x2016e5f8), line 82 in "dbwrap.c"
> regdb_upgrade_v2_to_v3_fn(rec = 0x2016e5f8, private_data = 0x2016de28), line 617 in "reg_backend_db.c"
> traverse_persistent_callback(tdb = 0x2016e1f8, kbuf = (...), dbuf = (...), private_data = 0x2ff21fd8), line 1581 in "dbwrap_ctdb.c"
> unnamed block in tdb_traverse_internal(tdb = 0x2016e1f8, fn = 0xf0570044, private_data = 0x2ff21fd8, tl = 0x2ff21f70), line 222 in "traverse.c"
> tdb_traverse_internal(tdb = 0x2016e1f8, fn = 0xf0570044, private_data = 0x2ff21fd8, tl = 0x2ff21f70), line 222 in "traverse.c"
> tdb_traverse(tdb = 0x2016e1f8, fn = 0xf0570044, private_data = 0x2ff21fd8), line 295 in "traverse.c"
> unnamed block in db_ctdb_traverse(db = 0x2016de28, fn = 0x2004d1f8, private_data = 0x2016de28), line 1647 in "dbwrap_ctdb.c"
> db_ctdb_traverse(db = 0x2016de28, fn = 0x2004d1f8, private_data = 0x2016de28), line 1647 in "dbwrap_ctdb.c"
> dbwrap_traverse(db = 0x2016de28, f = 0x2004d1f8, private_data = 0x2016de28, count = (nil)), line 377 in "dbwrap.c"
> regdb_upgrade_v2_to_v3(db = 0x2016de28), line 706 in "reg_backend_db.c"
> regdb_init(), line 829 in "reg_backend_db.c"
> registry_init_common(), line 33 in "reg_init_basic.c"
> registry_init_full(), line 81 in "reg_init_full.c"
> main(argc = 3, argv = 0x2ff2255c), line 2020 in "server.c"
This backtrace is an interesting one.
It's going through traverse_persistent_callback(), which looks like:
1560 static int traverse_persistent_callback(TDB_CONTEXT *tdb, TDB_DATA kbuf, TDB_DATA dbuf,
1561 void *private_data)
1562 {
1563 struct traverse_state *state = (struct traverse_state *)private_data;
1564 struct db_record *rec;
1565 TALLOC_CTX *tmp_ctx = talloc_new(state->db);
1566 int ret = 0;
1567
1568 /*
1569 * Skip the __db_sequence_number__ key:
1570 * This is used for persistent transactions internally.
1571 */
1572 if (kbuf.dsize == strlen(CTDB_DB_SEQNUM_KEY) + 1 &&
1573 strcmp((const char*)kbuf.dptr, CTDB_DB_SEQNUM_KEY) == 0)
1574 {
1575 goto done;
1576 }
1577
1578 /* we have to give them a locked record to prevent races */
1579 rec = db_ctdb_fetch_locked(state->db, tmp_ctx, kbuf);
1580 if (rec && rec->value.dsize > 0) {
1581 ret = state->fn(rec, state->private_data);
^^^^^^^^^
Dies here.
rec comes from db_ctdb_fetch_locked(). If we look at this we see:
1257 static struct db_record *db_ctdb_fetch_locked(struct db_context *db,
1258 TALLOC_CTX *mem_ctx,
1259 TDB_DATA key)
1260 {
1261 struct db_ctdb_ctx *ctx = talloc_get_type_abort(db->private_data,
1262 struct db_ctdb_ctx);
1263
Case #1
1264 if (ctx->transaction != NULL) {
1265 return db_ctdb_fetch_locked_transaction(ctx, mem_ctx, key);
1266 }
1267
Case #2
1268 if (db->persistent) {
1269 return db_ctdb_fetch_locked_persistent(ctx, mem_ctx, key);
1270 }
1271
Case #3
1272 return fetch_locked_internal(ctx, mem_ctx, key, false);
1273 }
So rec can come from the returns of
case #1 db_ctdb_fetch_locked_transaction()
case #2 db_ctdb_fetch_locked_persistent()
case #3 fetch_locked_internal().
Examining each in turn:
Case #1
--------------------------------------------------
db_ctdb_fetch_locked_transaction() has:
557 result->value_valid = true;
558
559 SAFE_FREE(ctdb_data.dptr);
560
561 return result;
the only other return is:
542 /* create the record */
543 result->value = tdb_null;
544 return result;
which would return a rec with rec->value.dsize == 0
and thus not call state->fn() due to:
1580 if (rec && rec->value.dsize > 0) {
1581 ret = state->fn(rec, state->private_data);
Case #2
--------------------------------------------------
db_ctdb_fetch_locked_persistent() has:
591 rec = db_ctdb_fetch_locked_transaction(ctx, mem_ctx, key);
...
606 return rec;
so should have the same 'result->value_valid = true' set as it comes
from db_ctdb_fetch_locked_transaction() (case #1 above).
Case #3
--------------------------------------------------
fetch_locked_internal() is the most complex, but tries
to migrate the record and once it's fetched does:
1241 if (result->value.dsize != 0) {
1242 result->value.dptr = talloc_memdup(
1243 result, ctdb_data.dptr + sizeof(crec->header),
1244 result->value.dsize);
1245 if (result->value.dptr == NULL) {
1246 DBG_ERR("talloc failed\n");
1247 TALLOC_FREE(result);
1248 }
1249 }
1250 result->value_valid = true;
1251
1252 SAFE_FREE(ctdb_data.dptr);
1253
1254 return result;
which again ends up with the only way to get
a returned record with rec->value.dsize > 0
going through a code path that sets 'result->value_valid = true'.
So currently I don't see how you can get there
from here (if you see what I mean :-) ?
Is this crash reproducible ? If so, can you
try adding some debug level 0 statements to the different
cases in #1, #2, #3 above so I can see which path
it went through and in that how it ended up returning
a rec value with value_valid = false ?
There must be a way this is happening but currently
I can't see it :-(.
Cheers,
Jeremy.
More information about the samba-technical
mailing list