[PATCHES] Fix smbd open race and render defer_open test less flakey
Michael Adam
obnox at samba.org
Wed Sep 10 05:59:21 MDT 2014
Addition:
Metze and I are working on a unification and
systematization of the timer treatment in our
torture tests.
This will make all those timing-critical tests less arbitrary...
Cheers - Michael
On 2014-09-10 at 13:57 +0200, Michael Adam wrote:
> Hi,
>
> The defer_open test is currently one of the most frequent
> causes of autobuild failure.
>
> I have just brought a few patches upstream (reviewed by Metze)
> that correct the defer_open test to reliably produce
> toreture-compatible "failure" instead of "error"
> if something goes wrong.
> (These reason for this was in the missing tracking of
> child process status in the test wrapper not in the
> test itself, so this might also have positive effects
> on a few other tests.)
>
> Now these changes gave me the opportunity to finally
> start analyzing what is going on.
> There are two causes of failure which I frequently saw:
>
> 1. return code OBJECT_NAME_NOT FOUND for the create call.
> 2. reply with SHARING_VIOLATION took too long
>
> The first is due to a race in open_file() which I
> fixed in the other attached patch: The file vanished
> between check for existence and the acl check.
> ==> review/push/comments appreciated.
>
> The second I have watered down by increasing the
> absolute (not-scaled) generosity from 1 second to 1.5 seconds
> in the test case.
> ==> patch attached.
>
> These patches allowed me to run the defer_open test successfully
> for several hundred times in a row on our autobuild-host!
> (Previously it failed reliably on that machine before it reached 10.)
>
> I just observed a third failure:
> create results in NT_STATUS_ACCESS_DENIED.
> I suspect that there is another race in the open code.
> I will investigate further, but this seems to be very rare.
>
> Cheers - Michael
>
> From 393cf1ba762e17c32cfdb58add0d5eaec5c3bff8 Mon Sep 17 00:00:00 2001
> From: Michael Adam <obnox at samba.org>
> Date: Wed, 10 Sep 2014 00:31:25 +0200
> Subject: [PATCH 1/2] s3:smbd: fix a race in open code
>
> The race is when a file vanishes between
> existence check and acl check.
>
> In this case, open_file_ncreate() returns
> OBJECT_NAME_NOT_FOUND even if the create
> was called with disposition OPEN_IF.
> But in this case, the file should be created.
>
> Signed-off-by: Michael Adam <obnox at samba.org>
> ---
> source3/smbd/open.c | 59 ++++++++++++++++++++++++++++++++++++++---------------
> 1 file changed, 42 insertions(+), 17 deletions(-)
>
> diff --git a/source3/smbd/open.c b/source3/smbd/open.c
> index 2762519..01b4cc4 100644
> --- a/source3/smbd/open.c
> +++ b/source3/smbd/open.c
> @@ -819,24 +819,49 @@ static NTSTATUS open_file(files_struct *fsp,
> smb_fname,
> false,
> access_mask);
> - } else if (local_flags & O_CREAT){
> - status = check_parent_access(conn,
> - smb_fname,
> - SEC_DIR_ADD_FILE);
> - } else {
> - /* File didn't exist and no O_CREAT. */
> - return NT_STATUS_OBJECT_NAME_NOT_FOUND;
> +
> + if (!NT_STATUS_IS_OK(status)) {
> + DEBUG(10, ("open_file: "
> + "smbd_check_access_rights "
> + "on file %s returned %s\n",
> + smb_fname_str_dbg(smb_fname),
> + nt_errstr(status)));
> + }
> +
> + if (!NT_STATUS_IS_OK(status) &&
> + !NT_STATUS_EQUAL(status,
> + NT_STATUS_OBJECT_NAME_NOT_FOUND))
> + {
> + return status;
> + }
> +
> + if (!NT_STATUS_IS_OK(status)) {
> + DEBUG(10, ("open_file: "
> + "file %s vanished since we "
> + "checked for existence.\n",
> + smb_fname_str_dbg(smb_fname)));
> + file_existed = false;
> + SET_STAT_INVALID(fsp->fsp_name->st);
> + }
> }
> - if (!NT_STATUS_IS_OK(status)) {
> - DEBUG(10,("open_file: "
> - "%s on file "
> - "%s returned %s\n",
> - file_existed ?
> - "smbd_check_access_rights" :
> - "check_parent_access",
> - smb_fname_str_dbg(smb_fname),
> - nt_errstr(status) ));
> - return status;
> +
> + if (!file_existed) {
> + if (!(local_flags & O_CREAT)) {
> + /* File didn't exist and no O_CREAT. */
> + return NT_STATUS_OBJECT_NAME_NOT_FOUND;
> + }
> +
> + status = check_parent_access(conn,
> + smb_fname,
> + SEC_DIR_ADD_FILE);
> + if (!NT_STATUS_IS_OK(status)) {
> + DEBUG(10, ("open_file: "
> + "check_parent_access on "
> + "file %s returned %s\n",
> + smb_fname_str_dbg(smb_fname),
> + nt_errstr(status) ));
> + return status;
> + }
> }
> }
>
> --
> 1.9.1
>
> From d22397e4e8852196b72cbd41ee3b3e48b9f7f411 Mon Sep 17 00:00:00 2001
> From: Michael Adam <obnox at samba.org>
> Date: Wed, 10 Sep 2014 01:03:57 +0200
> Subject: [PATCH 2/2] s4:torture:base: slightly more generous timing in the
> defer_open test
>
> This copes with cases where the server is very busy and
> can't provide tortures more tight time scaling..
> This is an attepmt to remove the flapping character of this test.
>
> Signed-off-by: Michael Adam <obnox at samba.org>
> ---
> source4/torture/basic/base.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/source4/torture/basic/base.c b/source4/torture/basic/base.c
> index bd9a304..6a792b2 100644
> --- a/source4/torture/basic/base.c
> +++ b/source4/torture/basic/base.c
> @@ -697,7 +697,7 @@ static bool run_deferopen(struct torture_context *tctx, struct smbcli_state *cli
> torture_comment(tctx, "pid %u: create[%d,%d] "
> "time elapsed: %.2f (1 sec = %.2f)\n",
> (unsigned)getpid(), i, j, e, sec);
> - if (e < (0.5 * sec) || e > ((1.5 * sec) + 1)) {
> + if (e < (0.5 * sec) || e > ((1.5 * sec) + 1.5)) {
> torture_comment(tctx, "pid %u: create[%d,%d] "
> "timing incorrect\n",
> (unsigned)getpid(), i, j);
> --
> 1.9.1
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20140910/0f06ac36/attachment.pgp>
More information about the samba-technical
mailing list