We are seeing timed event problems with chdir that suggest event ordering and chdir issues in Samba 3.5.15 and possibly other versions of Samba

Richard Sharpe realrichardsharpe at gmail.com
Tue Jun 12 11:16:10 MDT 2012

On Mon, Jun 11, 2012 at 5:23 PM, Richard Sharpe
<realrichardsharpe at gmail.com> wrote:
> On Mon, Jun 11, 2012 at 5:10 PM, Jeremy Allison <jra at samba.org> wrote:
>> On Mon, Jun 11, 2012 at 04:45:29PM -0700, Richard Sharpe wrote:
>>> It might be us misunderstanding how chdir is supposed to work in the VFS.
>>> When we see a chdir in the VFS we capture, globally, for all shares
>>> that go through the VFS, the path that will be prepended to any
>>> relative path that comes down to us.
>>> Then, if we are given a relative path in an operation through the VFS
>>> (one that does not begin with /) we prepend the path captured when
>>> chdir was called.
>>> We see some cases where the initial portion of the path we prepend
>>> (which was captured in the most recent chdir) does not match the
>>> prefix for the share we have just been asked to work on.
>>> For example, we see a lot of these:
>>>  chdir to: /containers/torture-22
>>>  chdir to: /
>>>  chdir to: /containers/torture-23
>>>  chdir to: /
>>>  chdir to: /containers/torture-24
>>>  chdir to: /
>>>  chdir to: /containers/torture-25
>>>  chdir to: /
>>>  chdir to: /containers/torture-26
>>>  chdir to: /
>>>  chdir to: /containers/torture-21
>>>  chdir to: /
>>> Where we were told to chdir to those places ... and then told to go
>>> back to /. Each of those is a separate share.
>>> Sometimes we see cases where the most recent chdir was to
>>> /containers/torture-24 but the operation we were asked to perform is
>>> on the share for /containers/torture-21 and the path is ./somename.
>> That looks like a missing set_current_service() call (which
>> is what should trigger the chdir to the current service being
>> processed).
>> Can you add in an abort when you see this condition (the
>> operation not matching the share path) and get
>> a backtrace ? A debug log will also help - and a
>> network capture trace.
> Yeah. I have added log_stack_trace at a couple of places and can
> easily get the debug log. In fact I have a debug log already. However,
> the network capture is harder because although the problem turns up
> reliably in our torture environment, it is not clear how many packets
> need to be captured before we see it.
> Anyway, I will keep trying to get more info about this.

OK, attached is a snippet of a log level 10, that shows:

1. chdir to /containers/torture-23 and schedule and update for 2 seconds later:

Update write time 2000000 usec later on w2k3-801-1m-9226

2. Do some writes on the same file ...

3. Then we chdir to /containers/torture-24 (just before that we
changed contexts)

4. Do some work on w2k3-801-96m-29061

5. Then the write-time-update fires and we try to update the write
time on w2k3-801-1m-9226:

[2012/06/11 12:27:21.536237,  5] smbd/fileio.c:181(update_write_time_handler)
  Update write time on w2k3-801-1m-9226

However, there is no chdir to the path where that file exists.

So, it looks like we did indeed forget to do a chdir somewhere.

Richard Sharpe
-------------- next part --------------
A non-text attachment was scrubbed...
Name: evidence-dropped-chdir.log10
Type: application/octet-stream
Size: 81189 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20120612/ff493c3f/attachment-0001.obj>

More information about the samba-technical mailing list