files locked forever -- logs! :)

Thu Jun 21 18:29:38 GMT 2001

David Lee wrote:
-- snip --

> It does indeed seem to be trying to grab the oplock from another process,
> in this case 18071 .
>
> When this fault strikes, check whether that process (whatever its number)
> really exists.  (Indeed, if you have the log files from the above incident
> still lying around, you might even be able to trace that incident.)

Yeah I still have the 1,5Gb log, it however takes some time for 'grep' to find
something :)). Altough I've now dd'd them in 3 parts :).

> See if there is a set of messages such as:
>
> [2001/06/18 14:16:23, 0] ../lib/fault.c:fault_report(40)
>   ===============================================================
> [2001/06/18 14:16:23, 0] ../lib/fault.c:fault_report(41)
>   INTERNAL ERROR: Signal 11 in pid 26673 (2.2.0)
>   Please read the file BUGS.txt in the distribution
> [2001/06/18 14:16:23, 0] ../lib/fault.c:fault_report(43)
>   ===============================================================
> [2001/06/18 14:16:23, 0] ../lib/util.c:smb_panic(1139)
>   PANIC: internal error

I couldn't find anything like this, the last stuff I see in the logs about pid 18071
(except the failed oplock_break msgs send to the process) is:
--cut --
[2001/06/21 12:34:39, 5] lib/util.c:show_msg(303)
  smb_vwv[11]=0 (0x0)
[2001/06/21 12:34:39, 5] lib/util.c:show_msg(308)
  smb_bcc=0
[2001/06/21 12:34:39, 3] smbd/process.c:switch_message(650)
  switch message SMBreadX (pid 18071)
[2001/06/21 12:34:39, 4] smbd/uid.c:become_user(114)
  Skipping become_user - already user
[2001/06/21 12:34:39, 10] smbd/fileio.c:seek_file(63)
  seek_file: requested pos = 3307008, new pos = 3307008
[2001/06/21 12:34:39, 10] lib/util_sock.c:read_smb_length_return_keepalive(602)
  got smb length of 146
-- etc --
Nothing special (after that), no errors or whatever :/.
I've restarted smb as soon as apps where locked, so I can't also trace back
if pid 18071 was still alive.

> In my case this was pid 26673 (your 18071) reporting its own untimely
> demise.  As a result of this, I see other processes getting stuck when
> trying to grab oplocks from that (now absent) process.

I see, maybe it should then just remove the oplock? Ofcourse, only if it's
an exclusive oplock (so only the client [who just died] had that file locked
and it can be safely removed). Also, I understand the _real_ source of the
problem (some smbd thread which crashed) should be corrected too, but
double protection against this is always nice in my opinion :).

> I understand (see other thread running on this list for last couple of
> days) that there is a known problem of this nature in 2.2.0, which is
> apparently corrected in the forthcoming release.

Sounds good :).

> Apparently, a process holding an oplock goes away unexpectedly:  it leaves
> behind the "======" log, but because of the nature of the exit, cannot
> clear its oplocks, which are left trailing.   Future processes don't
> detect this combination of circumstances (oplocks from deceased process)
> and themselves trip over...
>
> > I'm looking in the samba source code
> > for just two days or something.
>
> See if it classifies as above.  If so, your choices are probably:
>
> 1. backtrack to 2.0.7 (ideally 2.0.9);
>
> 2. try to live with and firefight the problem for now (what we are doing);
>    basically it means re-starting samba on your server.
>
> 3. (not recommended unless totally expert): update your source from CVS.

I added 'oplocks = no' to smb.conf, it may be a a bit (I heard 30% avg?) slower
than with oplocks, but untill it's fixed it's ok with me :)

> > I hope somebody can look at it, since it's a bit difficult for me to
> > trace the error exactly (and to fix it).
>
> As I say, I'm totally inexpert on oplocks.  But it sounds something like a
> "known problem", which may be firefightable until the next release.

Hope so, then I will be able to really enjoy the new samba 2.2 (:

Thanks,

    Syzop.