[PATCH] Patch for bug 13414: MSG_SMB_UNLOCK bounces between cleanupds in a cluster

Ralph Böhme slow at samba.org
Wed May 2 10:00:43 UTC 2018


Hi!

I noticed an issue with messages bouncing between cleanupds in a cluster running
master. Killing a single smbd (with SIGKILL) in a cluster results in
MSG_SMB_UNLOCK messages bouncing between nodes:

Node 0:

[2018/04/30 15:44:57.824893,  1, pid=1646] ../source3/smbd/server.c:881(remove_child_pid)
  Scheduled cleanup of brl and lock database after unclean shutdown

...

[2018/04/30 15:45:17.836225,  1, pid=1654] ../source3/smbd/smbd_cleanupd.c:99(smbd_cleanupd_unlock)
  smbd_cleanupd_unlock: Cleaning up brl and lock database after unclean shutdown
[2018/04/30 15:45:17.843968,  1, pid=1654] ../source3/smbd/smbd_cleanupd.c:99(smbd_cleanupd_unlock)
  smbd_cleanupd_unlock: Cleaning up brl and lock database after unclean shutdown
[2018/04/30 15:45:17.847836,  1, pid=1654] ../source3/smbd/smbd_cleanupd.c:99(smbd_cleanupd_unlock)
  smbd_cleanupd_unlock: Cleaning up brl and lock database after unclean shutdown
[2018/04/30 15:45:17.851000,  1, pid=1654] ../source3/smbd/smbd_cleanupd.c:99(smbd_cleanupd_unlock)
  smbd_cleanupd_unlock: Cleaning up brl and lock database after unclean shutdown
[2018/04/30 15:45:17.854174,  1, pid=1654] ../source3/smbd/smbd_cleanupd.c:99(smbd_cleanupd_unlock)
  smbd_cleanupd_unlock: Cleaning up brl and lock database after unclean shutdown
...

Node 1:

[2018/04/30 15:45:17.837851,  1, pid=19632] ../source3/smbd/smbd_cleanupd.c:99(smbd_cleanupd_unlock) 
  smbd_cleanupd_unlock: Cleaning up brl and lock database after unclean shutdown                     
[2018/04/30 15:45:17.844705,  1, pid=19632] ../source3/smbd/smbd_cleanupd.c:99(smbd_cleanupd_unlock) 
  smbd_cleanupd_unlock: Cleaning up brl and lock database after unclean shutdown                     
[2018/04/30 15:45:17.848739,  1, pid=19632] ../source3/smbd/smbd_cleanupd.c:99(smbd_cleanupd_unlock) 
  smbd_cleanupd_unlock: Cleaning up brl and lock database after unclean shutdown                     
[2018/04/30 15:45:17.853906,  1, pid=19632] ../source3/smbd/smbd_cleanupd.c:99(smbd_cleanupd_unlock) 
  smbd_cleanupd_unlock: Cleaning up brl and lock database after unclean shutdown                     
[2018/04/30 15:45:17.859117,  1, pid=19632] ../source3/smbd/smbd_cleanupd.c:99(smbd_cleanupd_unlock) 
  smbd_cleanupd_unlock: Cleaning up brl and lock database after unclean shutdown                     
...

This was introduced by the patches around
6423ca4bf293cac5e2f84b1a37bb29b06b5c05ed, because now the messaging send all
primitive broadcasts in the cluster which it didn't before afaict.

The attached patch fixes the issue by simply moving the cleanup trigger
notification from smbd to cleanupd to send a (currently unused)
MSG_SMB_BRL_VALIDATE message type, cf the commit message for details.

Please review carefully and push if ok. Thanks!

-slow

-- 
Ralph Boehme, Samba Team       https://samba.org/
Samba Developer, SerNet GmbH   https://sernet.de/en/samba/
GPG Key Fingerprint:           FAE2 C608 8A24 2520 51C5
                               59E4 AA1E 9B71 2639 9E46
-------------- next part --------------
From 00de80792c105009e520845340c3a620de935f83 Mon Sep 17 00:00:00 2001
From: Ralph Boehme <slow at samba.org>
Date: Mon, 30 Apr 2018 19:03:41 +0200
Subject: [PATCH] s3:cleanupd: use MSG_SMB_BRL_VALIDATE to signal cleanupd
 unclean process shutdown

Since 6423ca4bf293cac5e2f84b1a37bb29b06b5c05ed messaging_send_all()
broadcasts messages in a cluster, so cleanupd receives those broadcasts
and acts upon it by re-broadcasting the message. Result: message
storm.

By reactivating the currently unused MSG_SMB_BRL_VALIDATE for the
trigger message to cleanupd we avoid the storm.

Note that MSG_SMB_BRL_VALIDATE was unused only in the sense that noone
*listened* to it, but we were still *sending* the message in
smbd_parent_ctdb_reconfigured(). de6fe2a1dd6ab03b1c369b61da17fded72305b2d
removed listening for MSG_SMB_BRL_VALIDATE from cleanupd. This commits
brings it back.

Bug: https://bugzilla.samba.org/show_bug.cgi?id=13414

Signed-off-by: Ralph Boehme <slow at samba.org>
---
 source3/smbd/server.c        | 2 +-
 source3/smbd/smbd_cleanupd.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/source3/smbd/server.c b/source3/smbd/server.c
index e7e297f1f18..07d7136ef41 100644
--- a/source3/smbd/server.c
+++ b/source3/smbd/server.c
@@ -757,7 +757,7 @@ static void cleanup_timeout_fn(struct tevent_context *event_ctx,
 	parent->cleanup_te = NULL;
 
 	messaging_send_buf(parent->msg_ctx, parent->cleanupd,
-			   MSG_SMB_UNLOCK, NULL, 0);
+			   MSG_SMB_BRL_VALIDATE, NULL, 0);
 }
 
 static void cleanupd_started(struct tevent_req *req)
diff --git a/source3/smbd/smbd_cleanupd.c b/source3/smbd/smbd_cleanupd.c
index 5bd18c1411c..a9b1e8a1137 100644
--- a/source3/smbd/smbd_cleanupd.c
+++ b/source3/smbd/smbd_cleanupd.c
@@ -71,7 +71,7 @@ struct tevent_req *smbd_cleanupd_send(TALLOC_CTX *mem_ctx,
 		return tevent_req_post(req, ev);
 	}
 
-	status = messaging_register(msg, NULL, MSG_SMB_UNLOCK,
+	status = messaging_register(msg, NULL, MSG_SMB_BRL_VALIDATE,
 				    smbd_cleanupd_unlock);
 	if (tevent_req_nterror(req, status)) {
 		return tevent_req_post(req, ev);
-- 
2.13.6



More information about the samba-technical mailing list